Menu

Overview

Relevant source files

PaddleOCR is a production-ready, multilingual OCR (Optical Character Recognition) and document AI engine that provides end-to-end solutions from text extraction to intelligent document understanding. Built on the PaddlePaddle 3.0 deep learning framework, it offers a modular, plugin-based architecture supporting over 80 languages, multiple text types (printed, handwritten, mixed), and deployment across heterogeneous hardware platforms.

This page provides a high-level overview of PaddleOCR's capabilities, architecture, and entry points. For detailed installation procedures, see Installation and Setup. For information about migrating from PaddleOCR 2.x, see Version 2.x to 3.x Migration. For specific pipeline documentation, see Core Pipelines.

Sources: README.md21-38 docs/index.en.md10-26 docs/update/upgrade_notes.en.md1-13


Purpose and Scope

PaddleOCR 3.0 addresses three primary use cases:

  1. Universal Text Recognition: Extract text from images and documents across multiple languages and text types
  2. Document Structure Parsing: Convert complex PDFs and document images into structured formats (Markdown, JSON) while preserving layout and hierarchy
  3. Intelligent Information Extraction: Use LLM-enhanced methods to extract specific information from documents

The system provides a complete development lifecycle: data annotation → model training → optimization → deployment, with support for both low-code (via PaddleX integration) and full-code development workflows.

Sources: README.md40-56 docs/index.en.md17-25


Core Pipelines

PaddleOCR 3.0 organizes its functionality into four flagship pipelines that build upon each other through module composition:

Pipeline Characteristics:

PipelinePython ClassPrimary PurposeKey Improvement
PP-OCRv5PaddleOCRGeneral OCR for text detection and recognition13% accuracy improvement; single model supports 5 text types
PP-StructureV3PPStructureV3Document parsing to Markdown/JSONOutperforms commercial solutions on public benchmarks
PP-ChatOCRv4PPChatOCRv4DocLLM-enhanced key information extraction15% accuracy improvement with ERNIE 4.5 integration
PP-DocTranslationPPDocTranslationIntelligent document translationPreserves layout while translating content

Sources: README.md48-56 docs/quick_start.en.md39-196 paddleocr/__init__.py (implied structure)


System Architecture

Modular Design

PaddleOCR 3.0 employs a plugin-based architecture where pipelines compose reusable modules:

This design enables:

  • Module Reuse: Core modules (text detection, recognition) are shared across multiple pipelines
  • Independent Development: Each module can be trained, optimized, and deployed separately
  • Flexible Composition: Users can combine modules to create custom workflows
  • Efficient Updates: Model improvements propagate to all dependent pipelines

Sources: README.md58-62 docs/update/upgrade_notes.en.md15-21 High-level architecture diagrams


Key Architectural Layers

Sources: README.md58-79 docs/update/upgrade_notes.en.md17-21 paddleocr/__main__.py1-40


Installation and Quick Start

Installation

PaddleOCR 3.0 requires PaddlePaddle 3.0+ and provides dependency groups for different feature sets:

Dependency GroupInstallation CommandFeatures
Core (default)pip install paddleocrBasic text recognition (PP-OCR series)
doc-parserpip install "paddleocr[doc-parser]"Document parsing (PP-StructureV3)
iepip install "paddleocr[ie]"Information extraction (PP-ChatOCRv4)
transpip install "paddleocr[trans]"Document translation (PP-DocTranslation)
allpip install "paddleocr[all]"Complete functionality

Sources: README.md211-230 docs/quick_start.en.md7-35 setup.py1-19

Entry Points

PaddleOCR provides two primary interfaces:

1. Command-Line Interface (CLI)

CLI entry point: paddleocr/__main__.py18-39

2. Python API

Sources: README.md231-265 docs/quick_start.en.md37-196


Hardware and Deployment Support

Supported Hardware Platforms

PaddleOCR 3.0 supports deployment across heterogeneous hardware:

Sources: README.md14-16 README.md74-79 docs/version3.x/paddlex/overview.en.md140-165

Deployment Modes

Deployment ModeDescriptionUse Case
Python InferenceDirect Python API usageDevelopment, prototyping, simple applications
High-Performance InferenceOptimized engines (TensorRT, OpenVINO, ONNX Runtime)Production servers requiring maximum throughput
Service DeploymentREST API servers (Docker, HTTP)Microservices, cloud applications
On-Device DeploymentMobile/edge (Paddle-Lite)Android, iOS, embedded systems
C++ InferenceNative C++ deploymentHigh-performance local applications

Sources: README.md372-376 Development to Deployment workflow diagram


Integration with PaddleX

PaddleOCR 3.0 deeply integrates with PaddleX a low-code development platform:

PaddleX provides:

  • One-click model inference for all PaddleOCR pipelines
  • No-code training via cloud-based GUI at AI Studio
  • Unified deployment APIs across multiple hardware platforms
  • 200+ additional models for image classification, object detection, segmentation, time series

Sources: docs/version3.x/paddlex/overview.en.md1-21 README.md57 docs/update/upgrade_notes.en.md20


Version 3.0 Architecture Evolution

PaddleOCR 3.0 represents a complete architectural redesign from 2.x:

Key Changes:

AspectPaddleOCR 2.xPaddleOCR 3.x
ArchitectureMonolithic with feature branchesModular, plugin-based
InterfacesMixed, inconsistent APIsUnified CLI and Python API
Pipeline DesignSingle PPStructure classSeparate pipelines: PPStructureV3, PPChatOCRv4Doc, etc.
DeploymentLimited to PaddleServingHigh-performance, service, on-device, C++
FrameworkPaddlePaddle 2.xPaddlePaddle 3.0 with CINN compiler
LLM IntegrationNoneNative ERNIE 4.5 support in PP-ChatOCRv4

Breaking Changes:

  • PaddleOCR.ocr() method no longer accepts det, rec parameters (use dedicated TextDetection, TextRecognition classes instead)
  • PPStructure class removed (replaced by PPStructureV3)
  • show_log parameter replaced by comprehensive logging system
  • use_onnx parameter replaced by high-performance inference configuration

For detailed migration guidance, see Version 2.x to 3.x Migration.

Sources: docs/update/upgrade_notes.en.md1-83 docs/update/upgrade_notes.md1-84


Model Zoo and Capabilities

PaddleOCR 3.0 includes an extensive model zoo supporting diverse scenarios:

Text Recognition:

  • PP-OCRv5: 5 text types (Simplified/Traditional Chinese, English, Japanese, Pinyin) in single model
  • Multilingual: 37+ languages with specialized models (French, Spanish, Portuguese, Russian, Korean, etc.)
  • Specialized: English-only, Thai, Greek models for domain-specific accuracy

Document Analysis:

  • Layout Detection: 23 layout categories for papers, reports, contracts, magazines
  • Table Recognition: Wired and wireless tables with nested formulas/images
  • Formula Recognition: 50,000 LaTeX vocabulary supporting printed and handwritten formulas
  • Seal Recognition: Curved text OCR for official stamps

Preprocessing:

  • Document Orientation Classification: Detect and correct document rotation
  • Document Unwarping: Correct distorted/curved document images
  • Text Line Orientation: Classify individual text line angles

Sources: README.md68-97 docs/index/index.en.md36-42


Testing and Quality Assurance

PaddleOCR employs the TIPC (Training-Inference-Predict-Compare) framework for comprehensive testing:

TIPC validates that models maintain accuracy and performance across:

  • Multiple training configurations (quantization, pruning, distillation)
  • Different inference backends and optimization levels
  • Various hardware platforms without code changes
  • All precision modes (FP32, FP16, INT8)

Sources: Testing and Quality Assurance infrastructure diagram, .github/workflows/ (implied CI/CD integration)


Documentation and Community

Documentation Structure:

  • Quick Start: Installation and basic usage examples
  • Pipeline Tutorials: Detailed guides for each flagship pipeline
  • Module Usage: Instructions for using individual modules independently
  • Deployment Guides: High-performance inference, service deployment, on-device deployment
  • Algorithm Documentation: Model architecture details and training procedures
  • API Reference: Complete API documentation for all public interfaces

Community Resources:

  • GitHub Repository: https
  • Online Demos: AI Studio, ModelScope, HuggingFace
  • Technical Support: GitHub Issues for bug reports, Discussions for questions
  • Contributing: Open to contributions from individual and enterprise developers

Sources: README.md408-455 docs/index.en.md34 .github/workflows/close_inactive_issues.yaml1-24


Next Steps

To begin using PaddleOCR:

  1. Installation: Follow the Installation and Setup guide to install PaddleOCR with the appropriate dependency groups for your use case
  2. Quick Start: Try the Quick Start examples in Quick Start to run inference with pre-trained models
  3. Choose Your Pipeline: Review the Core Pipelines section to select the pipeline that matches your requirements
  4. Deployment: Plan your deployment strategy using the Deployment and Inference documentation
  5. Customization: For custom model training, see Model Architecture and Training

For migration from PaddleOCR 2.x, start with Version 2.x to 3.x Migration.

Sources: README.md205-244 docs/quick_start.en.md1-197