History

JOJO fea932425a chore: initial import		2025-11-14 16:44:12 +08:00
..
config	chore: initial import	2025-11-14 16:44:12 +08:00
examples	chore: initial import	2025-11-14 16:44:12 +08:00
logs	chore: initial import	2025-11-14 16:44:12 +08:00
src	chore: initial import	2025-11-14 16:44:12 +08:00
.env.example	chore: initial import	2025-11-14 16:44:12 +08:00
demo.py	chore: initial import	2025-11-14 16:44:12 +08:00
main.py	chore: initial import	2025-11-14 16:44:12 +08:00
README.md	chore: initial import	2025-11-14 16:44:12 +08:00
requirements.txt	chore: initial import	2025-11-14 16:44:12 +08:00
test_installation.py	chore: initial import	2025-11-14 16:44:12 +08:00

README.md

内容自动化创作Agent

一个完整的自动化内容创作系统，支持从网络爬取内容、智能提取文案、AI改稿，最终生成配音或视频文件。

🚀 功能特性

🕷️ 智能爬虫: 支持多平台内容抓取，包含反爬虫规避策略
📝 内容提取: 智能提取网页中的有效文案内容
🤖 AI改稿: 基于大语言模型的智能内容改写
🎙️ 语音生成: 支持多种语音合成服务（Azure、Google、ElevenLabs）
🎬 视频生成: 将文本内容转换为视频文件
⚡ 批量处理: 支持批量URL处理和并发执行
🔧 高度可配置: 灵活的配置文件系统

📦 安装

1. 克隆项目

git clone <项目地址>
cd content_automation_agent

2. 安装依赖

pip install -r requirements.txt

3. 配置API密钥

编辑 config/config.yaml 文件，添加你的API密钥：

api:
  openai:
    api_key: "your_openai_api_key_here"
  
  voice:
    provider: "azure"
    azure:
      subscription_key: "your_azure_speech_key"
      region: "eastus"

或者使用环境变量：

export OPENAI_API_KEY="your_openai_api_key"
export AZURE_SPEECH_KEY="your_azure_speech_key"
export AZURE_SPEECH_REGION="eastus"

🎯 快速开始

基本用法

1. 生成配音文件

python main.py https://example.com/article --format voice

2. 生成视频文件

python main.py https://example.com/article --format video

3. 批量处理

python main.py https://url1.com https://url2.com https://url3.com --format voice

4. 仅提取文本

python main.py https://example.com/article --format text

编程接口

import asyncio
from main import ContentAutomationAgent

async def main():
    # 初始化Agent
    agent = ContentAutomationAgent()
    
    # 处理单个URL
    result = await agent.process_url(
        url="https://example.com/article",
        output_format="voice"  # 或 "video", "text"
    )
    
    print(f"生成文件: {result['output_files']}")

# 运行
asyncio.run(main())

⚙️ 配置说明

配置文件结构

# API配置
api:
  openai:
    api_key: "your_api_key"
    model: "gpt-3.5-turbo"
    max_tokens: 2000
    temperature: 0.7

# 爬虫配置
crawler:
  user_agent: "Mozilla/5.0..."
  delay_range: [1, 3]
  max_retries: 3
  timeout: 30

# 内容提取配置
extraction:
  min_content_length: 100
  max_content_length: 5000
  remove_ads: true

# 改稿配置
rewriting:
  style_options: ["正式", "口语化", "幽默", "专业", "简洁"]
  target_length: 800

# 语音生成配置
voice_generation:
  output_format: "mp3"
  sample_rate: 24000

# 视频生成配置
video_generation:
  resolution: "1920x1080"
  fps: 30

🎨 支持的改写风格

正式: 适合官方、商务场合
口语化: 日常对话风格，通俗易懂
幽默: 轻松活泼，有趣味性
专业: 使用专业术语，深入分析
简洁: 言简意赅，重点突出

🌐 支持的平台

社交媒体

微博 (weibo.com)
知乎 (zhihu.com)
B站 (bilibili.com)
小红书 (xiaohongshu.com)

新闻网站

新浪新闻
网易新闻
腾讯新闻
今日头条

技术博客

掘金 (juejin.cn)
SegmentFault
CSDN
GitHub

🔧 高级用法

自定义语音参数

voice_params = {
    'voice_name': 'zh-CN-XiaoxiaoNeural',
    'speed': 1.2,
    'language_code': 'zh-CN'
}

result = await agent.process_url(
    url="https://example.com",
    output_format="voice",
    voice_params=voice_params
)

自定义视频参数

video_params = {
    'resolution': '1280x720',
    'background_color': '#1a1a1a',
    'text_color': '#ffffff',
    'font_size': 28,
    'duration_per_sentence': 4
}

result = await agent.process_url(
    url="https://example.com",
    output_format="video",
    video_params=video_params
)

批量异步处理

urls = [
    "https://url1.com",
    "https://url2.com",
    "https://url3.com"
]

results = await agent.process_batch(
    urls=urls,
    output_format="voice"
)

📊 输出文件

目录结构

output/
├── voice/
│   ├── voice_20240101_120000.mp3
│   └── voice_20240101_120001.mp3
├── video/
│   ├── video_20240101_120000.mp4
│   └── video_20240101_120001.mp4
└── report_20240101_120000.txt  # 处理报告

处理报告

每次批量处理后会生成详细的处理报告，包含：

处理成功的URL列表
失败的URL及错误原因
生成的文件路径
内容统计信息

🛡️ 反爬虫策略

随机延迟: 请求间隔随机化
User-Agent轮换: 模拟不同浏览器
并发限制: 控制同时请求数量
重试机制: 失败自动重试
SSL证书处理: 处理HTTPS证书验证

🚨 注意事项

API配额: 注意各API服务的调用配额限制
网络延迟: 批量处理时建议设置适当的延迟
内容版权: 请确保爬取的内容不侵犯版权
使用规范: 遵守各平台的使用条款和robots.txt

🔍 故障排除

常见问题

1. 爬虫失败

检查网络连接
确认URL可访问
检查是否触发反爬虫机制

2. API调用失败

验证API密钥是否正确
检查API配额是否用完
确认网络能够访问API服务

3. 内容提取失败

检查网页结构是否变化
确认内容长度符合要求
查看日志了解具体原因

4. 语音/视频生成失败

检查输出目录权限
确认磁盘空间充足
验证依赖库是否正确安装

日志查看

查看 logs/ 目录下的日志文件获取详细错误信息。

🤝 贡献

欢迎提交Issue和Pull Request来改进这个项目！

📄 许可证

MIT License

📞 支持

如有问题，请在GitHub上提交Issue。

⭐ 如果这个项目对你有帮助，请给个Star支持一下！

README.md Unescape Escape