MinerU is a high-quality PDF conversion tool developed by OpenDataLab, designed for researchers and developers to simplify the extraction and format conversion of PDF documents.
Key Features
- PDF to Markdown: Converts PDF documents into structured Markdown format, preserving headings, paragraphs, lists, and other document structures.
- PDF to JSON: Extracts PDF content into JSON format for easier data processing and analysis.
- Preserves Document Structure: Maintains the original layout and content integrity during conversion.
- Open Source: Released under an open-source license, allowing users to freely use, modify, and distribute.
Use Cases
- Academic paper content extraction
- Report and document structuring
- Automated parsing of dataset documentation
- Knowledge base construction and content migration
Project URL
https://github.com/opendatalab/MinerU