Overview

Relevant source files

PaddleOCR is a production-ready, multilingual OCR (Optical Character Recognition) and document AI engine that provides end-to-end solutions from text extraction to intelligent document understanding. Built on the PaddlePaddle 3.0 deep learning framework, it offers a modular, plugin-based architecture supporting over 80 languages, multiple text types (printed, handwritten, mixed), and deployment across heterogeneous hardware platforms.

This page provides a high-level overview of PaddleOCR's capabilities, architecture, and entry points. For detailed installation procedures, see Installation and Setup. For information about migrating from PaddleOCR 2.x, see Version 2.x to 3.x Migration. For specific pipeline documentation, see Core Pipelines.

Sources: README.md21-38 docs/index.en.md10-26 docs/update/upgrade_notes.en.md1-13

Purpose and Scope

PaddleOCR 3.0 addresses three primary use cases:

Universal Text Recognition: Extract text from images and documents across multiple languages and text types
Document Structure Parsing: Convert complex PDFs and document images into structured formats (Markdown, JSON) while preserving layout and hierarchy
Intelligent Information Extraction: Use LLM-enhanced methods to extract specific information from documents

The system provides a complete development lifecycle: data annotation → model training → optimization → deployment, with support for both low-code (via PaddleX integration) and full-code development workflows.

Sources: README.md40-56 docs/index.en.md17-25

Core Pipelines

PaddleOCR 3.0 organizes its functionality into four flagship pipelines that build upon each other through module composition:

Pipeline Characteristics:

Pipeline	Python Class	Primary Purpose	Key Improvement
PP-OCRv5	`PaddleOCR`	General OCR for text detection and recognition	13% accuracy improvement; single model supports 5 text types
PP-StructureV3	`PPStructureV3`	Document parsing to Markdown/JSON	Outperforms commercial solutions on public benchmarks
PP-ChatOCRv4	`PPChatOCRv4Doc`	LLM-enhanced key information extraction	15% accuracy improvement with ERNIE 4.5 integration
PP-DocTranslation	`PPDocTranslation`	Intelligent document translation	Preserves layout while translating content

Sources: README.md48-56 docs/quick_start.en.md39-196 paddleocr/__init__.py (implied structure)

System Architecture

Modular Design

PaddleOCR 3.0 employs a plugin-based architecture where pipelines compose reusable modules:

This design enables:

Module Reuse: Core modules (text detection, recognition) are shared across multiple pipelines
Independent Development: Each module can be trained, optimized, and deployed separately
Flexible Composition: Users can combine modules to create custom workflows
Efficient Updates: Model improvements propagate to all dependent pipelines

Sources: README.md58-62 docs/update/upgrade_notes.en.md15-21 High-level architecture diagrams

Key Architectural Layers

Sources: README.md58-79 docs/update/upgrade_notes.en.md17-21 paddleocr/__main__.py1-40

Installation and Quick Start

Installation

PaddleOCR 3.0 requires PaddlePaddle 3.0+ and provides dependency groups for different feature sets:

Dependency Group	Installation Command	Features
Core (default)	`pip install paddleocr`	Basic text recognition (PP-OCR series)
`doc-parser`	`pip install "paddleocr[doc-parser]"`	Document parsing (PP-StructureV3)
`ie`	`pip install "paddleocr[ie]"`	Information extraction (PP-ChatOCRv4)
`trans`	`pip install "paddleocr[trans]"`	Document translation (PP-DocTranslation)
`all`	`pip install "paddleocr[all]"`	Complete functionality

Sources: README.md211-230 docs/quick_start.en.md7-35 setup.py1-19

Entry Points

PaddleOCR provides two primary interfaces:

1. Command-Line Interface (CLI)

CLI entry point: paddleocr/__main__.py18-39

2. Python API

Sources: README.md231-265 docs/quick_start.en.md37-196

Hardware and Deployment Support

Supported Hardware Platforms

PaddleOCR 3.0 supports deployment across heterogeneous hardware:

Sources: README.md14-16 README.md74-79 docs/version3.x/paddlex/overview.en.md140-165

Deployment Modes

Deployment Mode	Description	Use Case
Python Inference	Direct Python API usage	Development, prototyping, simple applications
High-Performance Inference	Optimized engines (TensorRT, OpenVINO, ONNX Runtime)	Production servers requiring maximum throughput
Service Deployment	REST API servers (Docker, HTTP)	Microservices, cloud applications
On-Device Deployment	Mobile/edge (Paddle-Lite)	Android, iOS, embedded systems
C++ Inference	Native C++ deployment	High-performance local applications

Sources: README.md372-376 Development to Deployment workflow diagram

Integration with PaddleX

PaddleOCR 3.0 deeply integrates with PaddleX a low-code development platform:

PaddleX provides:

One-click model inference for all PaddleOCR pipelines
No-code training via cloud-based GUI at AI Studio
Unified deployment APIs across multiple hardware platforms
200+ additional models for image classification, object detection, segmentation, time series

Sources: docs/version3.x/paddlex/overview.en.md1-21 README.md57 docs/update/upgrade_notes.en.md20

Version 3.0 Architecture Evolution

PaddleOCR 3.0 represents a complete architectural redesign from 2.x:

Key Changes:

Aspect	PaddleOCR 2.x	PaddleOCR 3.x
Architecture	Monolithic with feature branches	Modular, plugin-based
Interfaces	Mixed, inconsistent APIs	Unified CLI and Python API
Pipeline Design	Single `PPStructure` class	Separate pipelines: `PPStructureV3`, `PPChatOCRv4Doc`, etc.
Deployment	Limited to PaddleServing	High-performance, service, on-device, C++
Framework	PaddlePaddle 2.x	PaddlePaddle 3.0 with CINN compiler
LLM Integration	None	Native ERNIE 4.5 support in PP-ChatOCRv4

Breaking Changes:

PaddleOCR.ocr() method no longer accepts det, rec parameters (use dedicated TextDetection, TextRecognition classes instead)
PPStructure class removed (replaced by PPStructureV3)
show_log parameter replaced by comprehensive logging system
use_onnx parameter replaced by high-performance inference configuration

For detailed migration guidance, see Version 2.x to 3.x Migration.

Sources: docs/update/upgrade_notes.en.md1-83 docs/update/upgrade_notes.md1-84

Model Zoo and Capabilities

PaddleOCR 3.0 includes an extensive model zoo supporting diverse scenarios:

Text Recognition:

PP-OCRv5: 5 text types (Simplified/Traditional Chinese, English, Japanese, Pinyin) in single model
Multilingual: 37+ languages with specialized models (French, Spanish, Portuguese, Russian, Korean, etc.)
Specialized: English-only, Thai, Greek models for domain-specific accuracy

Document Analysis:

Layout Detection: 23 layout categories for papers, reports, contracts, magazines
Table Recognition: Wired and wireless tables with nested formulas/images
Formula Recognition: 50,000 LaTeX vocabulary supporting printed and handwritten formulas
Seal Recognition: Curved text OCR for official stamps

Preprocessing:

Document Orientation Classification: Detect and correct document rotation
Document Unwarping: Correct distorted/curved document images
Text Line Orientation: Classify individual text line angles

Sources: README.md68-97 docs/index/index.en.md36-42

Testing and Quality Assurance

PaddleOCR employs the TIPC (Training-Inference-Predict-Compare) framework for comprehensive testing:

TIPC validates that models maintain accuracy and performance across:

Multiple training configurations (quantization, pruning, distillation)
Different inference backends and optimization levels
Various hardware platforms without code changes
All precision modes (FP32, FP16, INT8)

Sources: Testing and Quality Assurance infrastructure diagram, .github/workflows/ (implied CI/CD integration)

Documentation and Community

Documentation Structure:

Quick Start: Installation and basic usage examples
Pipeline Tutorials: Detailed guides for each flagship pipeline
Module Usage: Instructions for using individual modules independently
Deployment Guides: High-performance inference, service deployment, on-device deployment
Algorithm Documentation: Model architecture details and training procedures
API Reference: Complete API documentation for all public interfaces

Community Resources:

GitHub Repository: https
Online Demos: AI Studio, ModelScope, HuggingFace
Technical Support: GitHub Issues for bug reports, Discussions for questions
Contributing: Open to contributions from individual and enterprise developers

Sources: README.md408-455 docs/index.en.md34 .github/workflows/close_inactive_issues.yaml1-24

Next Steps

To begin using PaddleOCR:

Installation: Follow the Installation and Setup guide to install PaddleOCR with the appropriate dependency groups for your use case
Quick Start: Try the Quick Start examples in Quick Start to run inference with pre-trained models
Choose Your Pipeline: Review the Core Pipelines section to select the pipeline that matches your requirements
Deployment: Plan your deployment strategy using the Deployment and Inference documentation
Customization: For custom model training, see Model Architecture and Training

For migration from PaddleOCR 2.x, start with Version 2.x to 3.x Migration.

Sources: README.md205-244 docs/quick_start.en.md1-197