Deterministic PDF to JSON and Markdown

Turn any PDF into structured, deterministic JSON.

Skely reconstructs the structure inside your PDFs — tables with columns, rows, and footers; labeled key-value pairs; multi-column layout — and returns it as JSON or Markdown. The same bytes in always produce the same bytes out, so your output is diffable, hashable, and reproducible.

POST /v1/convert
curl https://api.skely.io/v1/convert \
  -H "Authorization: Bearer sk_live_…" \
  -F file=@invoice.pdf
200 OKapplication/json
{
  "request_id": "req_8f2c1a90",
  "status": "success",
  "pages": 1,
  "credits_charged": 1,
  "format": "json",
  "result": {
    "documentType": { "type": "invoice", "subtype": "hotel_folio", "confidence": 0.91 },
    "pages": [ {
      "page": 0,
      "blocks": [ {
        "type": "table",
        "columns": [ { "text": "Date" }, { "text": "Charges" } ],
        "rows": [ { "cells": [
          { "column": "Date", "text": "09-03-23", "semantic": "date" },
          { "column": "Charges", "text": "3.69", "semantic": "currency", "normalized": 3.69 }
        ] } ],
        "footers": [ { "label": "Total" } ]
      } ]
    } ]
  }
}
One endpoint, one Bearer keyTables, key-value & document typeJSON or Markdown back
Deterministic & reproducible
Tables, columns, rows & footers
Key-value & semantic tagging
One endpoint, one key

PDFs hide their structure.
Your parser shouldn't have to guess at it.

Where does the table end?

Rows, columns, and headers live as loose text runs and coordinates, not structure

Will it parse the same tomorrow?

OCR and model-based tools drift — the same PDF can yield different output on every run

Can you audit the result?

If output isn't reproducible, you can't diff it, test it, or trust it in a pipeline

Everything you need to turn PDFs into structured data.

Structured JSON

The document skeleton, fully reconstructed.

Tables, key-value pairs, and clustered data emitted as clean, typed JSON — the same bytes on every run.

Clean Markdown

Readable, diff-friendly, deterministic.

The same reconstructed structure rendered as Markdown — headings, tables, and lists.

Deterministic by Design

Same PDF, same bytes, every time.

No probabilistic models — Skely reads the document's own text and layout. Output is byte-stable and reproducible, so you can diff it, hash it, and trust it in production.

Table Extraction

Rows and columns, not loose text.

Skely rebuilds tables from the page's underlying layout — headers, cells, and footer rows resolved into structured rows.

Key-Value Extraction

Fields and labels, paired automatically.

Invoices, forms, and statements carry labeled fields. Skely pairs each label with its value and returns them as structured key-value data — no template setup required.

And we're just getting started.

Conversion HistoryEvery conversion logged with its request id and output hash, giving you a verifiable, replayable record of exactly what was returned.
Usage AnalyticsSee pages converted and credits remaining across your subscription and purchased buckets in one dashboard.

Tables, reconstructed — headers, rows, and totals as structured JSON.

Skely rebuilds a table from the page's own text and vector layout: it finds the header row, aligns every value column beneath it, types each cell, and pulls trailing summary rows like Total and Balance into a separate footers array. The same bytes on every run.

Hotel folio, page 1
result.pages[0].blocks[n]
  • Header row detected, every column reported with its text and alignment — no ruling lines required.
  • Each cell carries its column, text, and bounds; dates and amounts are tagged semantic, and amounts get a numeric normalized value.
  • Summary rows are lifted into a distinct footers array, each with its label (Total, Balance) and value cells — kept out of your data rows.

Multi-line stacked headers (a group label sitting over its sub-columns) and divider-delimited header cells (a / b) are merged or split into the right columns automatically.

columnsrows[].cells[]footers[]semanticnormalized

Skely doesn’t just find the value. It types it.

Every leaf in the output carries a semantic tag, and where the format is unambiguous Skely adds a parsed, ready-to-use value. Pure format matching, applied identically on every run.

Dates

Date-shaped strings, tagged and ISO-normalized.

Unambiguous, four-digit-year dates get a parsed normalized value. A two-digit year is tagged but left un-normalized — no century guessing.

Currencies & amounts

Money read exactly, never inferred.

Two-decimal strings are tagged currency with a normalized number equal to what’s written; a symbol or ISO code adds a unit.

Addresses

Recognized line-by-line and as a block.

Lines are tagged street, locality, phone, and more; a cluster holding two or more is tagged address as a whole.

Same model also tags phone, fax, email, url, and percent. No dictionaries, no locale tables, no network calls.

Skely tells you what the document is, before you parse it.

Every conversion leads with a documentType verdict — a high-level type, a finer subtype when one fits, a confidence score, ranked alternatives, and the exact signals that decided it. One deterministic classifier, run over the reconstructed structure. No model, no network.

A hotel bill

A folio with a charges table, a guest panel, and a Total line.

invoice / hotel_folio confidence 0.91

A point-of-sale slip

A short receipt with line items and a payment total.

invoice / receipt

A multi-page agreement

Defined terms, numbered clauses, a signature block.

contract

The verdict, in the response

type, subtype, confidence, alternatives, signals.

Skely resolves about 16 top-level types and names a finer subtype when a variant rule wins. When nothing scores high enough, the subtype is left off and the top-level type still resolves. Low-evidence files come back as "unknown" at confidence 0 — never a guess dressed up as an answer.

One PDF. Every region, in reading order.

Real pages aren't a single column. Skely clusters loose text into coherent blocks, keeps tables intact, and orders every block top-to-bottom then left-to-right — so a side-by-side letterhead, bill-to, and guest panel come back correctly sequenced, each with its own bounds.

ClustersTablesKey-ValueMulti-column reading order

Three steps from PDF to structured data.

1

Send your PDF

POST a PDF with an embedded text layer to the API with a Bearer key, or drop it into the Online Convert tool. No setup, no templates.

2

Skely reconstructs the skeleton

Deterministically, by reading the document's own text and layout. Tables, key-value pairs, and clustered data are resolved from the underlying structure.

3

Get byte-stable JSON or Markdown

The same input always returns the same bytes — output you can diff, hash, and audit. We hand back the structure; your pipeline takes it from there.

Building an integration? Read the API documentation — endpoints, parameters, the error catalog, and copy-paste code samples.

Start free. Scale when you're ready.

250 pages a month, free forever. No credit card required. 1 credit = 1 page. See the full plans and pricing.

Free
$0 per
month
This includes:
250 pages/month included
Structured JSON + Markdown output
Tables & key-value extraction
Deterministic, byte-stable output
Online Convert web tool
No credit card required
Standard
$5 per
month
This includes:
5,000 pages/month included
Full API access with Bearer API keys
Structured JSON + Markdown output
Tables, key-value & multi-column layout extraction
Deterministic, reproducible output
Online Convert web tool
Pro
$15 per
month
This includes:
Everything in Standard, plus:
20,000 pages/month included
Top-ups: $10 for 20,000 credits that never expire
Subscription pages reset monthly; top-up credits persist
Priority API throughput

Frequently asked questions

250 pages per month at no cost, with full API access and the Online Convert tool. 1 credit = 1 page. No credit card required.

Skely reads the document's own text and layout data and reconstructs its structure deterministically — the same input bytes always produce the same output bytes on a pinned engine version. Every conversion leads with a documentType (type, subtype, and a confidence score), then returns the page as ordered blocks: tables with their columns, rows, cells, and footer rows (Total, Balance, and similar), plus clustered text blocks and labeled key-value pairs. There is no recognition guesswork — what is written is what you get back.

A documentType classification, then each page as a list of blocks in reading order (top-to-bottom, then left-to-right). Table blocks carry columns[] with detected per-column alignment, rows[].cells[], and a separate footers[] array for summary rows. Cluster blocks carry text and key-value (kv) entries, with their bounds preserved so you can re-render the original layout. Detected dates, amounts, phone, fax, email, URL, and address values are tagged with a semantic field, and unambiguous amounts and full-year dates also carry a normalized value. Every non-blank glyph is retained — anything the recognizers don't place is still returned in a catch-all block flagged residual, so nothing is silently dropped.

PDFs that carry their own embedded text and vector layer — the files most invoices, statements, folios, reports, and forms are generated as. Skely reads that layer directly; it does not rasterize the page or read pixels, so scanned-image PDFs, photos, and handwriting yield little or no text and are out of scope today.

1 credit = 1 page. You have two buckets. Subscription credits come with your plan, reset to your plan's allotment each month with no carryover, and are always drawn down first. Purchased top-up credits are charged only after your subscription credits run out and never expire while your account is open. Pro plans can buy top-ups at $10 for 20,000 credits.

No. Credits are a prepaid, non-refundable consumable with no cash value, redeemable only for page conversions on your account. Subscription credits reset each month and do not carry over; purchased credits stay available until you use them or close your account.

Closing your account permanently deletes the account and all associated data — your profile, API keys, stored files, and usage records. Any remaining credit balance is tied to that account and cannot be recovered or transferred after closure. If we terminate an account for cause (for example abuse or violation of the terms), access ends and the same applies. If Skely discontinues the service entirely, we apply a pro-rata refund for the unused purchased credits remaining in your account.

Yes. Manage or cancel your subscription at any time from your account settings. There are no long-term contracts and no cancellation fees. Purchased top-up credits remain available while your account is open.

Skely is API-first. Create a Bearer API key from your account, send it in the Authorization header, and start converting. The Online Convert tool in the dashboard runs on the same backend and the same account credits as the API, so output and billing are identical either way. See the API documentation for endpoints, parameters, and code samples.

Stop parsing PDFs by hand.

Get deterministic, byte-stable JSON and Markdown from your PDFs — tables, key-value pairs, and document structure in one API call. 250 pages free, no credit card required.

Read the docs