Turn any PDF into structured, deterministic JSON.
curl https://api.skely.io/v1/convert \
-H "Authorization: Bearer sk_live_…" \
-F file=@invoice.pdf{
"request_id": "req_8f2c1a90",
"status": "success",
"pages": 1,
"credits_charged": 1,
"format": "json",
"result": {
"documentType": { "type": "invoice", "subtype": "hotel_folio", "confidence": 0.91 },
"pages": [ {
"page": 0,
"blocks": [ {
"type": "table",
"columns": [ { "text": "Date" }, { "text": "Charges" } ],
"rows": [ { "cells": [
{ "column": "Date", "text": "09-03-23", "semantic": "date" },
{ "column": "Charges", "text": "3.69", "semantic": "currency", "normalized": 3.69 }
] } ],
"footers": [ { "label": "Total" } ]
} ]
} ]
}
}PDFs hide their structure. Your parser shouldn't have to guess at it.
Where does the table end?
Rows, columns, and headers live as loose text runs and coordinates, not structure
Will it parse the same tomorrow?
OCR and model-based tools drift — the same PDF can yield different output on every run
Can you audit the result?
If output isn't reproducible, you can't diff it, test it, or trust it in a pipeline
Everything you need to turn PDFs into structured data.
Structured JSON
The document skeleton, fully reconstructed.
Tables, key-value pairs, and clustered data emitted as clean, typed JSON — the same bytes on every run.
Clean Markdown
Readable, diff-friendly, deterministic.
The same reconstructed structure rendered as Markdown — headings, tables, and lists.
Deterministic by Design
Same PDF, same bytes, every time.
No probabilistic models — Skely reads the document's own text and layout. Output is byte-stable and reproducible, so you can diff it, hash it, and trust it in production.
Table Extraction
Rows and columns, not loose text.
Skely rebuilds tables from the page's underlying layout — headers, cells, and footer rows resolved into structured rows.
Key-Value Extraction
Fields and labels, paired automatically.
Invoices, forms, and statements carry labeled fields. Skely pairs each label with its value and returns them as structured key-value data — no template setup required.
And we're just getting started.
Tables, reconstructed — headers, rows, and totals as structured JSON.
Skely rebuilds a table from the page's own text and vector layout: it finds the header row, aligns every value column beneath it, types each cell, and pulls trailing summary rows like Total and Balance into a separate footers array. The same bytes on every run.
- Header row detected, every column reported with its
textandalignment— no ruling lines required. - Each cell carries its
column,text, andbounds; dates and amounts are taggedsemantic, and amounts get a numericnormalizedvalue. - Summary rows are lifted into a distinct
footersarray, each with itslabel(Total, Balance) and value cells — kept out of your data rows.
Multi-line stacked headers (a group label sitting over its sub-columns) and divider-delimited header cells (a / b) are merged or split into the right columns automatically.
Skely doesn’t just find the value. It types it.
Every leaf in the output carries a semantic tag, and where the format is unambiguous Skely adds a parsed, ready-to-use value. Pure format matching, applied identically on every run.
Dates
Date-shaped strings, tagged and ISO-normalized.
Unambiguous, four-digit-year dates get a parsed normalized value. A two-digit year is tagged but left un-normalized — no century guessing.
Currencies & amounts
Money read exactly, never inferred.
Two-decimal strings are tagged currency with a normalized number equal to what’s written; a symbol or ISO code adds a unit.
Addresses
Recognized line-by-line and as a block.
Lines are tagged street, locality, phone, and more; a cluster holding two or more is tagged address as a whole.
Same model also tags phone, fax, email, url, and percent. No dictionaries, no locale tables, no network calls.
Skely tells you what the document is, before you parse it.
Every conversion leads with a documentType verdict — a high-level type, a finer subtype when one fits, a confidence score, ranked alternatives, and the exact signals that decided it. One deterministic classifier, run over the reconstructed structure. No model, no network.
A hotel bill
A folio with a charges table, a guest panel, and a Total line.
invoice / hotel_folio confidence 0.91A point-of-sale slip
A short receipt with line items and a payment total.
invoice / receiptA multi-page agreement
Defined terms, numbered clauses, a signature block.
contractThe verdict, in the response
type, subtype, confidence, alternatives, signals.
Skely resolves about 16 top-level types and names a finer subtype when a variant rule wins. When nothing scores high enough, the subtype is left off and the top-level type still resolves. Low-evidence files come back as "unknown" at confidence 0 — never a guess dressed up as an answer.
One PDF. Every region, in reading order.
Real pages aren't a single column. Skely clusters loose text into coherent blocks, keeps tables intact, and orders every block top-to-bottom then left-to-right — so a side-by-side letterhead, bill-to, and guest panel come back correctly sequenced, each with its own bounds.
Three steps from PDF to structured data.
Send your PDF
POST a PDF with an embedded text layer to the API with a Bearer key, or drop it into the Online Convert tool. No setup, no templates.
Skely reconstructs the skeleton
Deterministically, by reading the document's own text and layout. Tables, key-value pairs, and clustered data are resolved from the underlying structure.
Get byte-stable JSON or Markdown
The same input always returns the same bytes — output you can diff, hash, and audit. We hand back the structure; your pipeline takes it from there.
Building an integration? Read the API documentation — endpoints, parameters, the error catalog, and copy-paste code samples.
Start free. Scale when you're ready.
250 pages a month, free forever. No credit card required. 1 credit = 1 page. See the full plans and pricing.
month
month
month
Frequently asked questions
250 pages per month at no cost, with full API access and the Online Convert tool. 1 credit = 1 page. No credit card required.
Skely reads the document's own text and layout data and reconstructs its structure deterministically — the same input bytes always produce the same output bytes on a pinned engine version. Every conversion leads with a documentType (type, subtype, and a confidence score), then returns the page as ordered blocks: tables with their columns, rows, cells, and footer rows (Total, Balance, and similar), plus clustered text blocks and labeled key-value pairs. There is no recognition guesswork — what is written is what you get back.
A documentType classification, then each page as a list of blocks in reading order (top-to-bottom, then left-to-right). Table blocks carry columns[] with detected per-column alignment, rows[].cells[], and a separate footers[] array for summary rows. Cluster blocks carry text and key-value (kv) entries, with their bounds preserved so you can re-render the original layout. Detected dates, amounts, phone, fax, email, URL, and address values are tagged with a semantic field, and unambiguous amounts and full-year dates also carry a normalized value. Every non-blank glyph is retained — anything the recognizers don't place is still returned in a catch-all block flagged residual, so nothing is silently dropped.
PDFs that carry their own embedded text and vector layer — the files most invoices, statements, folios, reports, and forms are generated as. Skely reads that layer directly; it does not rasterize the page or read pixels, so scanned-image PDFs, photos, and handwriting yield little or no text and are out of scope today.
1 credit = 1 page. You have two buckets. Subscription credits come with your plan, reset to your plan's allotment each month with no carryover, and are always drawn down first. Purchased top-up credits are charged only after your subscription credits run out and never expire while your account is open. Pro plans can buy top-ups at $10 for 20,000 credits.
No. Credits are a prepaid, non-refundable consumable with no cash value, redeemable only for page conversions on your account. Subscription credits reset each month and do not carry over; purchased credits stay available until you use them or close your account.
Closing your account permanently deletes the account and all associated data — your profile, API keys, stored files, and usage records. Any remaining credit balance is tied to that account and cannot be recovered or transferred after closure. If we terminate an account for cause (for example abuse or violation of the terms), access ends and the same applies. If Skely discontinues the service entirely, we apply a pro-rata refund for the unused purchased credits remaining in your account.
Yes. Manage or cancel your subscription at any time from your account settings. There are no long-term contracts and no cancellation fees. Purchased top-up credits remain available while your account is open.
Skely is API-first. Create a Bearer API key from your account, send it in the Authorization header, and start converting. The Online Convert tool in the dashboard runs on the same backend and the same account credits as the API, so output and billing are identical either way. See the API documentation for endpoints, parameters, and code samples.
Stop parsing PDFs by hand.
Get deterministic, byte-stable JSON and Markdown from your PDFs — tables, key-value pairs, and document structure in one API call. 250 pages free, no credit card required.