Introduction

Convert PDFs, Word Docs, Excel Sheets, PowerPoint, CSVs, Web Pages, HTML, and Markdown to clean JSON with full control over your schema. Open-Source. Fast. Scalable.

What is Dcup?

Dcup is your ultimate open-source tool for converting messy, unstructured data into clean, structured JSON. Whether it's PDFs, Word docs, Excel sheets, or web pages, Dcup helps you turn raw content into precisely formatted data—without the headache. Define your schema, and let Dcup handle the conversion.

Why Does This Exist?

Data work can be overwhelming—especially when you're spending 80% of your time cleaning and formatting. Developers waste hours parsing invoices, AI teams struggle with inconsistent training data, and analysts drown in endless CSVs. Existing tools are either too slow, too rigid, or too expensive.

Dcup was created to solve these problems with speed, flexibility, and simplicity.

Why Dcup?

  • Customizable JSON, Your Way Define exactly what data you need and how to structure it. Whether it’s extracting customer names, invoice totals, or any other field—Dcup makes it easy to define and process.

  • Support for All File Types PDFs, DOCs, PPTs, HTML, CSVs, Markdown—even URLs. If it’s a document, Dcup can parse it. No data left behind.

  • Lightning-Fast Performance Built with Go and optimized caching, Dcup processes 100 documents in the time it takes to brew a cup of coffee. Get your results faster than ever.

  • Scalable and Reliable From individual developers to enterprise-grade pipelines, Dcup scales with you. With async processing, auto-retries, and built-in error handling, you can trust Dcup to handle your data needs at any scale.

Who’s This For?

  • Developers tired of writing yet another PDF parser.

  • Data Engineers building pipelines that don’t break.

  • AI/ML Teams needing clean, structured training data.

  • Business Users who just want their spreadsheets to work.

How It Works

  1. You Define the Schema: Tell us what data to extract and how to format it.

Example:

{  
  "invoice": {  
    "customer": { "name": "string", "id": "number" },  
    "total": "number",  
    "items": []  
  }  
}
  1. You Send the Files/URLs: Drag-and-drop or send via API.

  2. We Return Perfect JSON: Every time, exactly as you specified.

Under the Hood

  • Golang Backend: For raw speed and low memory usage. Your 100MB PDF? Processed before competitors finish loading.

  • Caching Mechanism: Frequently used schemas and templates are cached, slashing latency for repeat requests.

  • Stateless Architecture: Horizontally scalable. Add more instances in seconds, not hours.

Privacy Policy

Your privacy and data security are our top priorities. We do not store your uploaded files or URLs. The only data we temporarily retain is the extracted result for caching purposes, which is automatically deleted after 24 hours.


Keywords for SEO: JSON conversion API, document to JSON, PDF parsing, AI data preparation, structured data extraction, customizable schema, Golang API, scalable data processing, invoice automation, AI/ML training data.

On this page