765 lines
21 KiB
Markdown
765 lines
21 KiB
Markdown
# 使用Python调用硅基流动DeepSeek-OCR模型教程
|
|
|
|
## 📖 简介
|
|
|
|
DeepSeek-OCR是一个强大的视觉语言模型,专门用于光学字符识别(OCR)和文档理解。通过硅基流动(SiliconFlow)平台,你可以轻松通过API调用这个模型来分析图片内容。
|
|
|
|
## 🎯 模型特点
|
|
|
|
- **高精度识别**: 在10倍压缩率下能保持97%的识别精度
|
|
- **多语言支持**: 支持100+种语言识别
|
|
- **多种模式**: 支持文档转Markdown、表格识别、手写识别等
|
|
- **高效处理**: 使用视觉token压缩技术,大幅减少token消耗
|
|
|
|
## 🚀 快速开始
|
|
|
|
### 1. 准备工作
|
|
|
|
#### 1.1 注册硅基流动账号并获取API密钥
|
|
|
|
1. 访问 [硅基流动官网](https://cloud.siliconflow.cn/)
|
|
2. 注册账号(新用户可获得2000万免费tokens)
|
|
3. 进入"API密钥"页面,点击"新建API密钥"
|
|
4. 复制保存你的API Key(格式为`sk-xxxxx`)
|
|
|
|
#### 1.2 安装必要的Python库
|
|
|
|
```bash
|
|
pip install openai pillow requests
|
|
```
|
|
|
|
### 2. 基础使用示例
|
|
|
|
#### 2.1 使用本地图片进行OCR识别
|
|
|
|
```python
|
|
"""
|
|
示例1: 基础OCR识别 - 识别图片中的文字内容
|
|
适用场景: 文档扫描、截图文字提取
|
|
"""
|
|
import base64
|
|
from openai import OpenAI
|
|
|
|
# 配置API
|
|
client = OpenAI(
|
|
api_key="sk-your-api-key-here", # 替换为你的API密钥
|
|
base_url="https://api.siliconflow.cn/v1"
|
|
)
|
|
|
|
# 将图片转换为base64编码
|
|
def encode_image(image_path):
|
|
"""读取图片并转换为base64编码"""
|
|
with open(image_path, "rb") as image_file:
|
|
return base64.b64encode(image_file.read()).decode('utf-8')
|
|
|
|
# 准备图片
|
|
image_path = "your_image.jpg" # 替换为你的图片路径
|
|
base64_image = encode_image(image_path)
|
|
|
|
# 调用API进行OCR识别
|
|
response = client.chat.completions.create(
|
|
model="deepseek-ai/DeepSeek-OCR",
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": f"data:image/jpeg;base64,{base64_image}"
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": "请识别图片中的所有文字内容"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
max_tokens=4096,
|
|
temperature=0.7
|
|
)
|
|
|
|
# 打印结果
|
|
print("识别结果:")
|
|
print(response.choices[0].message.content)
|
|
```
|
|
|
|
#### 2.2 使用URL图片进行识别
|
|
|
|
```python
|
|
"""
|
|
示例2: 使用图片URL进行识别
|
|
适用场景: 处理网络图片、远程图片分析
|
|
"""
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
api_key="sk-your-api-key-here",
|
|
base_url="https://api.siliconflow.cn/v1"
|
|
)
|
|
|
|
# 使用图片URL
|
|
image_url = "https://example.com/image.jpg" # 替换为实际的图片URL
|
|
|
|
response = client.chat.completions.create(
|
|
model="deepseek-ai/DeepSeek-OCR",
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": image_url
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": "请描述这张图片的内容"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
max_tokens=4096
|
|
)
|
|
|
|
print(response.choices[0].message.content)
|
|
```
|
|
|
|
### 3. 高级使用场景
|
|
|
|
#### 3.1 文档转Markdown格式
|
|
|
|
```python
|
|
"""
|
|
示例3: 将文档图片转换为Markdown格式
|
|
适用场景: 文档数字化、笔记整理
|
|
"""
|
|
from openai import OpenAI
|
|
import base64
|
|
|
|
def document_to_markdown(image_path, api_key):
|
|
"""将文档图片转换为Markdown格式"""
|
|
|
|
client = OpenAI(
|
|
api_key=api_key,
|
|
base_url="https://api.siliconflow.cn/v1"
|
|
)
|
|
|
|
# 读取并编码图片
|
|
with open(image_path, "rb") as image_file:
|
|
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
|
|
|
|
# 调用API,使用专门的Markdown转换提示词
|
|
response = client.chat.completions.create(
|
|
model="deepseek-ai/DeepSeek-OCR",
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": f"data:image/jpeg;base64,{base64_image}"
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": "<image>\n<|grounding|>Convert the document to markdown."
|
|
}
|
|
]
|
|
}
|
|
],
|
|
max_tokens=8192,
|
|
temperature=0
|
|
)
|
|
|
|
return response.choices[0].message.content
|
|
|
|
# 使用示例
|
|
api_key = "sk-your-api-key-here"
|
|
markdown_result = document_to_markdown("document.jpg", api_key)
|
|
print(markdown_result)
|
|
|
|
# 可选: 保存为markdown文件
|
|
with open("output.md", "w", encoding="utf-8") as f:
|
|
f.write(markdown_result)
|
|
```
|
|
|
|
#### 3.2 表格识别和提取
|
|
|
|
```python
|
|
"""
|
|
示例4: 识别图片中的表格并转换为结构化数据
|
|
适用场景: 财务报表、数据表格处理
|
|
"""
|
|
from openai import OpenAI
|
|
import base64
|
|
|
|
def extract_table(image_path, api_key):
|
|
"""从图片中提取表格数据"""
|
|
|
|
client = OpenAI(
|
|
api_key=api_key,
|
|
base_url="https://api.siliconflow.cn/v1"
|
|
)
|
|
|
|
with open(image_path, "rb") as image_file:
|
|
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
|
|
|
|
response = client.chat.completions.create(
|
|
model="deepseek-ai/DeepSeek-OCR",
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": f"data:image/jpeg;base64,{base64_image}"
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": "请识别图片中的表格,并以Markdown表格格式输出"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
max_tokens=4096
|
|
)
|
|
|
|
return response.choices[0].message.content
|
|
|
|
# 使用示例
|
|
result = extract_table("table_image.jpg", "sk-your-api-key-here")
|
|
print(result)
|
|
```
|
|
|
|
#### 3.3 手写文字识别
|
|
|
|
```python
|
|
"""
|
|
示例5: 识别手写文字
|
|
适用场景: 手写笔记、签名识别
|
|
"""
|
|
from openai import OpenAI
|
|
import base64
|
|
|
|
def recognize_handwriting(image_path, api_key):
|
|
"""识别手写文字内容"""
|
|
|
|
client = OpenAI(
|
|
api_key=api_key,
|
|
base_url="https://api.siliconflow.cn/v1"
|
|
)
|
|
|
|
with open(image_path, "rb") as image_file:
|
|
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
|
|
|
|
response = client.chat.completions.create(
|
|
model="deepseek-ai/DeepSeek-OCR",
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": f"data:image/jpeg;base64,{base64_image}"
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": "请仔细识别这张图片中的所有手写文字内容,包括标点符号"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
max_tokens=4096,
|
|
temperature=0.3
|
|
)
|
|
|
|
return response.choices[0].message.content
|
|
|
|
# 使用示例
|
|
handwriting_text = recognize_handwriting("handwriting.jpg", "sk-your-api-key-here")
|
|
print("识别的手写内容:")
|
|
print(handwriting_text)
|
|
```
|
|
|
|
#### 3.4 批量图片处理
|
|
|
|
```python
|
|
"""
|
|
示例6: 批量处理多张图片
|
|
适用场景: 大量文档批处理、自动化工作流
|
|
"""
|
|
from openai import OpenAI
|
|
import base64
|
|
import os
|
|
from pathlib import Path
|
|
|
|
def batch_ocr_process(image_folder, api_key, output_folder="output"):
|
|
"""批量处理文件夹中的所有图片"""
|
|
|
|
client = OpenAI(
|
|
api_key=api_key,
|
|
base_url="https://api.siliconflow.cn/v1"
|
|
)
|
|
|
|
# 创建输出文件夹
|
|
Path(output_folder).mkdir(exist_ok=True)
|
|
|
|
# 获取所有图片文件
|
|
image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.webp']
|
|
image_files = [f for f in os.listdir(image_folder)
|
|
if any(f.lower().endswith(ext) for ext in image_extensions)]
|
|
|
|
results = []
|
|
|
|
for idx, image_file in enumerate(image_files, 1):
|
|
print(f"处理第 {idx}/{len(image_files)} 张图片: {image_file}")
|
|
|
|
image_path = os.path.join(image_folder, image_file)
|
|
|
|
try:
|
|
# 读取并编码图片
|
|
with open(image_path, "rb") as f:
|
|
base64_image = base64.b64encode(f.read()).decode('utf-8')
|
|
|
|
# 调用OCR API
|
|
response = client.chat.completions.create(
|
|
model="deepseek-ai/DeepSeek-OCR",
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": f"data:image/jpeg;base64,{base64_image}"
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": "请识别图片中的所有文字内容"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
max_tokens=4096
|
|
)
|
|
|
|
result_text = response.choices[0].message.content
|
|
|
|
# 保存结果到文本文件
|
|
output_file = os.path.join(output_folder, f"{Path(image_file).stem}.txt")
|
|
with open(output_file, "w", encoding="utf-8") as f:
|
|
f.write(result_text)
|
|
|
|
results.append({
|
|
"image": image_file,
|
|
"status": "success",
|
|
"output": output_file
|
|
})
|
|
|
|
print(f"✓ 完成: {image_file} -> {output_file}")
|
|
|
|
except Exception as e:
|
|
print(f"✗ 错误: {image_file} - {str(e)}")
|
|
results.append({
|
|
"image": image_file,
|
|
"status": "error",
|
|
"error": str(e)
|
|
})
|
|
|
|
return results
|
|
|
|
# 使用示例
|
|
api_key = "sk-your-api-key-here"
|
|
results = batch_ocr_process("images_folder", api_key, "ocr_results")
|
|
|
|
# 打印统计信息
|
|
success_count = sum(1 for r in results if r["status"] == "success")
|
|
print(f"\n处理完成: 成功 {success_count}/{len(results)} 张图片")
|
|
```
|
|
|
|
### 4. 完整实用工具类
|
|
|
|
```python
|
|
"""
|
|
示例7: 封装的OCR工具类
|
|
提供便捷的接口用于各种OCR任务
|
|
"""
|
|
import base64
|
|
import os
|
|
from typing import Optional, List, Dict
|
|
from openai import OpenAI
|
|
from pathlib import Path
|
|
|
|
class DeepSeekOCR:
|
|
"""DeepSeek-OCR 工具类"""
|
|
|
|
def __init__(self, api_key: str):
|
|
"""
|
|
初始化OCR工具
|
|
|
|
Args:
|
|
api_key: 硅基流动API密钥
|
|
"""
|
|
self.client = OpenAI(
|
|
api_key=api_key,
|
|
base_url="https://api.siliconflow.cn/v1"
|
|
)
|
|
self.model = "deepseek-ai/DeepSeek-OCR"
|
|
|
|
def _encode_image(self, image_path: str) -> str:
|
|
"""将图片编码为base64"""
|
|
with open(image_path, "rb") as image_file:
|
|
return base64.b64encode(image_file.read()).decode('utf-8')
|
|
|
|
def ocr(self,
|
|
image_path: str,
|
|
prompt: str = "请识别图片中的所有文字内容",
|
|
max_tokens: int = 4096,
|
|
temperature: float = 0.7) -> str:
|
|
"""
|
|
基础OCR识别
|
|
|
|
Args:
|
|
image_path: 图片路径
|
|
prompt: 提示词
|
|
max_tokens: 最大输出token数
|
|
temperature: 温度参数
|
|
|
|
Returns:
|
|
识别结果文本
|
|
"""
|
|
base64_image = self._encode_image(image_path)
|
|
|
|
response = self.client.chat.completions.create(
|
|
model=self.model,
|
|
messages=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{
|
|
"type": "image_url",
|
|
"image_url": {
|
|
"url": f"data:image/jpeg;base64,{base64_image}"
|
|
}
|
|
},
|
|
{
|
|
"type": "text",
|
|
"text": prompt
|
|
}
|
|
]
|
|
}
|
|
],
|
|
max_tokens=max_tokens,
|
|
temperature=temperature
|
|
)
|
|
|
|
return response.choices[0].message.content
|
|
|
|
def to_markdown(self, image_path: str) -> str:
|
|
"""
|
|
将文档图片转换为Markdown格式
|
|
|
|
Args:
|
|
image_path: 图片路径
|
|
|
|
Returns:
|
|
Markdown格式文本
|
|
"""
|
|
return self.ocr(
|
|
image_path,
|
|
prompt="<image>\n<|grounding|>Convert the document to markdown.",
|
|
max_tokens=8192,
|
|
temperature=0
|
|
)
|
|
|
|
def extract_table(self, image_path: str) -> str:
|
|
"""
|
|
提取表格数据
|
|
|
|
Args:
|
|
image_path: 图片路径
|
|
|
|
Returns:
|
|
表格文本(Markdown格式)
|
|
"""
|
|
return self.ocr(
|
|
image_path,
|
|
prompt="请识别图片中的表格,并以Markdown表格格式输出",
|
|
max_tokens=4096
|
|
)
|
|
|
|
def describe_image(self, image_path: str) -> str:
|
|
"""
|
|
描述图片内容
|
|
|
|
Args:
|
|
image_path: 图片路径
|
|
|
|
Returns:
|
|
图片描述文本
|
|
"""
|
|
return self.ocr(
|
|
image_path,
|
|
prompt="请详细描述这张图片的内容,包括主要元素、颜色、布局等",
|
|
max_tokens=4096
|
|
)
|
|
|
|
def batch_process(self,
|
|
image_folder: str,
|
|
output_folder: str = "output",
|
|
save_format: str = "txt") -> List[Dict]:
|
|
"""
|
|
批量处理图片
|
|
|
|
Args:
|
|
image_folder: 图片文件夹路径
|
|
output_folder: 输出文件夹路径
|
|
save_format: 保存格式('txt' 或 'md')
|
|
|
|
Returns:
|
|
处理结果列表
|
|
"""
|
|
Path(output_folder).mkdir(exist_ok=True)
|
|
|
|
image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.webp']
|
|
image_files = [f for f in os.listdir(image_folder)
|
|
if any(f.lower().endswith(ext) for ext in image_extensions)]
|
|
|
|
results = []
|
|
|
|
for idx, image_file in enumerate(image_files, 1):
|
|
print(f"处理 {idx}/{len(image_files)}: {image_file}")
|
|
|
|
image_path = os.path.join(image_folder, image_file)
|
|
|
|
try:
|
|
result_text = self.ocr(image_path)
|
|
|
|
output_file = os.path.join(
|
|
output_folder,
|
|
f"{Path(image_file).stem}.{save_format}"
|
|
)
|
|
|
|
with open(output_file, "w", encoding="utf-8") as f:
|
|
f.write(result_text)
|
|
|
|
results.append({
|
|
"image": image_file,
|
|
"status": "success",
|
|
"output": output_file
|
|
})
|
|
|
|
except Exception as e:
|
|
results.append({
|
|
"image": image_file,
|
|
"status": "error",
|
|
"error": str(e)
|
|
})
|
|
|
|
return results
|
|
|
|
# 使用示例
|
|
if __name__ == "__main__":
|
|
# 初始化工具
|
|
ocr = DeepSeekOCR(api_key="sk-your-api-key-here")
|
|
|
|
# 基础OCR
|
|
text = ocr.ocr("document.jpg")
|
|
print("OCR结果:", text)
|
|
|
|
# 转换为Markdown
|
|
markdown = ocr.to_markdown("document.jpg")
|
|
print("\nMarkdown:", markdown)
|
|
|
|
# 提取表格
|
|
table = ocr.extract_table("table.jpg")
|
|
print("\n表格:", table)
|
|
|
|
# 描述图片
|
|
description = ocr.describe_image("photo.jpg")
|
|
print("\n图片描述:", description)
|
|
|
|
# 批量处理
|
|
results = ocr.batch_process("images", "results")
|
|
print(f"\n批量处理完成: {len(results)} 张图片")
|
|
```
|
|
|
|
## 📋 提示词参考
|
|
|
|
根据不同的使用场景,你可以使用以下提示词:
|
|
|
|
### 文档类
|
|
|
|
```python
|
|
# 文档转Markdown
|
|
"<image>\n<|grounding|>Convert the document to markdown."
|
|
|
|
# 自由OCR(不保留格式)
|
|
"<image>\nFree OCR."
|
|
|
|
# 保留格式的OCR
|
|
"请识别图片中的所有文字内容,并尽可能保留原始排版格式"
|
|
```
|
|
|
|
### 表格类
|
|
|
|
```python
|
|
# 表格识别
|
|
"请识别图片中的表格,并以Markdown表格格式输出"
|
|
|
|
# 表格数据提取
|
|
"请提取图片中的表格数据,按行列结构化输出"
|
|
```
|
|
|
|
### 分析类
|
|
|
|
```python
|
|
# 图片描述
|
|
"请详细描述这张图片的内容"
|
|
|
|
# 图表分析
|
|
"请分析图片中的图表数据,并总结主要趋势"
|
|
|
|
# 多语言识别
|
|
"请识别图片中的文字,并指出使用的语言"
|
|
```
|
|
|
|
## ⚙️ 参数说明
|
|
|
|
### model
|
|
- 值: `"deepseek-ai/DeepSeek-OCR"`
|
|
- 说明: 模型名称,固定使用此值
|
|
|
|
### max_tokens
|
|
- 范围: 1-8192
|
|
- 默认: 4096
|
|
- 说明: 最大输出token数,复杂文档建议设置为8192
|
|
|
|
### temperature
|
|
- 范围: 0-2
|
|
- 默认: 0.7
|
|
- 说明:
|
|
- 0: 最确定的输出,适合OCR识别
|
|
- 0.7: 平衡创造性和准确性
|
|
- 1.0+: 更有创造性,适合描述性任务
|
|
|
|
## 💡 最佳实践
|
|
|
|
### 1. 图片质量优化
|
|
- 使用清晰的图片(建议分辨率 > 1024x1024)
|
|
- 避免过度压缩的图片
|
|
- 确保文字清晰可见
|
|
|
|
### 2. 提示词优化
|
|
- 明确说明需要的输出格式
|
|
- 对于特定类型的内容(如表格、代码),在提示词中明确指出
|
|
- 使用中文提示词效果更好
|
|
|
|
### 3. 错误处理
|
|
```python
|
|
import time
|
|
from openai import OpenAI, APIError
|
|
|
|
def ocr_with_retry(image_path, api_key, max_retries=3):
|
|
"""带重试机制的OCR调用"""
|
|
client = OpenAI(api_key=api_key, base_url="https://api.siliconflow.cn/v1")
|
|
|
|
for attempt in range(max_retries):
|
|
try:
|
|
with open(image_path, "rb") as f:
|
|
base64_image = base64.b64encode(f.read()).decode('utf-8')
|
|
|
|
response = client.chat.completions.create(
|
|
model="deepseek-ai/DeepSeek-OCR",
|
|
messages=[{
|
|
"role": "user",
|
|
"content": [{
|
|
"type": "image_url",
|
|
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
|
|
}, {
|
|
"type": "text",
|
|
"text": "请识别图片中的所有文字内容"
|
|
}]
|
|
}],
|
|
max_tokens=4096
|
|
)
|
|
|
|
return response.choices[0].message.content
|
|
|
|
except APIError as e:
|
|
if attempt < max_retries - 1:
|
|
wait_time = 2 ** attempt
|
|
print(f"API错误,{wait_time}秒后重试... ({attempt + 1}/{max_retries})")
|
|
time.sleep(wait_time)
|
|
else:
|
|
print(f"达到最大重试次数,失败: {str(e)}")
|
|
raise
|
|
except Exception as e:
|
|
print(f"未预期的错误: {str(e)}")
|
|
raise
|
|
```
|
|
|
|
### 4. Token优化
|
|
- 对于简单的文字识别,使用较小的max_tokens(如2048)
|
|
- 对于复杂文档或表格,使用较大的max_tokens(8192)
|
|
- 批量处理时注意API速率限制
|
|
|
|
## 🔍 常见问题
|
|
|
|
### Q1: 如何获取更多免费tokens?
|
|
A: 硅基流动提供邀请机制,邀请其他用户注册可获得额外tokens
|
|
|
|
### Q2: 支持哪些图片格式?
|
|
A: 支持常见格式如JPG、PNG、WEBP、BMP等
|
|
|
|
### Q3: 单张图片有大小限制吗?
|
|
A: 建议单张图片不超过10MB
|
|
|
|
### Q4: 识别效果不理想怎么办?
|
|
A:
|
|
- 确保图片清晰度足够
|
|
- 优化提示词,更明确地说明需求
|
|
- 尝试调整temperature参数
|
|
- 对于复杂文档,可以分区域识别后合并
|
|
|
|
### Q5: API调用频率有限制吗?
|
|
A: 有速率限制,具体限制请查看硅基流动平台文档
|
|
|
|
## 📚 相关资源
|
|
|
|
- [硅基流动官网](https://cloud.siliconflow.cn/)
|
|
- [硅基流动文档](https://docs.siliconflow.cn/)
|
|
- [DeepSeek-OCR GitHub](https://github.com/deepseek-ai/DeepSeek-OCR)
|
|
- [OpenAI Python SDK文档](https://github.com/openai/openai-python)
|
|
|
|
## 📝 总结
|
|
|
|
通过硅基流动平台调用DeepSeek-OCR模型非常简单:
|
|
|
|
1. 注册账号获取API密钥
|
|
2. 安装openai库
|
|
3. 使用OpenAI兼容的API格式调用
|
|
4. 将图片转换为base64或使用URL
|
|
5. 根据需求调整提示词和参数
|
|
|
|
这个教程涵盖了从基础使用到高级应用的各种场景,你可以根据自己的需求选择合适的示例代码使用。
|
|
|
|
## 🎉 开始使用
|
|
|
|
现在你已经掌握了所有必要的知识,快去注册硅基流动账号,开始你的OCR之旅吧!
|
|
|
|
---
|
|
|
|
**最后更新**: 2025年11月
|
|
**作者**: Claude
|
|
**许可**: MIT License
|