PaddleOCR is a production-ready, multilingual OCR (Optical Character Recognition) and document AI engine that provides end-to-end solutions from text extraction to intelligent document understanding. Built on the PaddlePaddle 3.0 deep learning framework, it offers a modular, plugin-based architecture supporting over 80 languages, multiple text types (printed, handwritten, mixed), and deployment across heterogeneous hardware platforms.
This page provides a high-level overview of PaddleOCR's capabilities, architecture, and entry points. For detailed installation procedures, see Installation and Setup. For information about migrating from PaddleOCR 2.x, see Version 2.x to 3.x Migration. For specific pipeline documentation, see Core Pipelines.
Sources: README.md21-38 docs/index.en.md10-26 docs/update/upgrade_notes.en.md1-13
PaddleOCR 3.0 addresses three primary use cases:
The system provides a complete development lifecycle: data annotation → model training → optimization → deployment, with support for both low-code (via PaddleX integration) and full-code development workflows.
Sources: README.md40-56 docs/index.en.md17-25
PaddleOCR 3.0 organizes its functionality into four flagship pipelines that build upon each other through module composition:
Pipeline Characteristics:
Pipeline | Python Class | Primary Purpose | Key Improvement |
---|---|---|---|
PP-OCRv5 | PaddleOCR | General OCR for text detection and recognition | 13% accuracy improvement; single model supports 5 text types |
PP-StructureV3 | PPStructureV3 | Document parsing to Markdown/JSON | Outperforms commercial solutions on public benchmarks |
PP-ChatOCRv4 | PPChatOCRv4Doc | LLM-enhanced key information extraction | 15% accuracy improvement with ERNIE 4.5 integration |
PP-DocTranslation | PPDocTranslation | Intelligent document translation | Preserves layout while translating content |
Sources: README.md48-56 docs/quick_start.en.md39-196 paddleocr/__init__.py (implied structure)
PaddleOCR 3.0 employs a plugin-based architecture where pipelines compose reusable modules:
This design enables:
Sources: README.md58-62 docs/update/upgrade_notes.en.md15-21 High-level architecture diagrams
Sources: README.md58-79 docs/update/upgrade_notes.en.md17-21 paddleocr/__main__.py1-40
PaddleOCR 3.0 requires PaddlePaddle 3.0+ and provides dependency groups for different feature sets:
Dependency Group | Installation Command | Features |
---|---|---|
Core (default) | pip install paddleocr | Basic text recognition (PP-OCR series) |
doc-parser | pip install "paddleocr[doc-parser]" | Document parsing (PP-StructureV3) |
ie | pip install "paddleocr[ie]" | Information extraction (PP-ChatOCRv4) |
trans | pip install "paddleocr[trans]" | Document translation (PP-DocTranslation) |
all | pip install "paddleocr[all]" | Complete functionality |
Sources: README.md211-230 docs/quick_start.en.md7-35 setup.py1-19
PaddleOCR provides two primary interfaces:
1. Command-Line Interface (CLI)
CLI entry point: paddleocr/__main__.py18-39
2. Python API
Sources: README.md231-265 docs/quick_start.en.md37-196
PaddleOCR 3.0 supports deployment across heterogeneous hardware:
Sources: README.md14-16 README.md74-79 docs/version3.x/paddlex/overview.en.md140-165
Deployment Mode | Description | Use Case |
---|---|---|
Python Inference | Direct Python API usage | Development, prototyping, simple applications |
High-Performance Inference | Optimized engines (TensorRT, OpenVINO, ONNX Runtime) | Production servers requiring maximum throughput |
Service Deployment | REST API servers (Docker, HTTP) | Microservices, cloud applications |
On-Device Deployment | Mobile/edge (Paddle-Lite) | Android, iOS, embedded systems |
C++ Inference | Native C++ deployment | High-performance local applications |
Sources: README.md372-376 Development to Deployment workflow diagram
PaddleOCR 3.0 deeply integrates with PaddleX a low-code development platform:
PaddleX provides:
Sources: docs/version3.x/paddlex/overview.en.md1-21 README.md57 docs/update/upgrade_notes.en.md20
PaddleOCR 3.0 represents a complete architectural redesign from 2.x:
Key Changes:
Aspect | PaddleOCR 2.x | PaddleOCR 3.x |
---|---|---|
Architecture | Monolithic with feature branches | Modular, plugin-based |
Interfaces | Mixed, inconsistent APIs | Unified CLI and Python API |
Pipeline Design | Single PPStructure class | Separate pipelines: PPStructureV3 , PPChatOCRv4Doc , etc. |
Deployment | Limited to PaddleServing | High-performance, service, on-device, C++ |
Framework | PaddlePaddle 2.x | PaddlePaddle 3.0 with CINN compiler |
LLM Integration | None | Native ERNIE 4.5 support in PP-ChatOCRv4 |
Breaking Changes:
PaddleOCR.ocr()
method no longer accepts det
, rec
parameters (use dedicated TextDetection
, TextRecognition
classes instead)PPStructure
class removed (replaced by PPStructureV3
)show_log
parameter replaced by comprehensive logging systemuse_onnx
parameter replaced by high-performance inference configurationFor detailed migration guidance, see Version 2.x to 3.x Migration.
Sources: docs/update/upgrade_notes.en.md1-83 docs/update/upgrade_notes.md1-84
PaddleOCR 3.0 includes an extensive model zoo supporting diverse scenarios:
Text Recognition:
Document Analysis:
Preprocessing:
Sources: README.md68-97 docs/index/index.en.md36-42
PaddleOCR employs the TIPC (Training-Inference-Predict-Compare) framework for comprehensive testing:
TIPC validates that models maintain accuracy and performance across:
Sources: Testing and Quality Assurance infrastructure diagram, .github/workflows/ (implied CI/CD integration)
Documentation Structure:
Community Resources:
Sources: README.md408-455 docs/index.en.md34 .github/workflows/close_inactive_issues.yaml1-24
To begin using PaddleOCR:
For migration from PaddleOCR 2.x, start with Version 2.x to 3.x Migration.
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.