Document Extraction
Document extraction is the Document Insights flow for turning a document into structured fields. It is asynchronous: you create an extraction, receive an ID, and then either poll for the result or receive a webhook event.
The id returned from POST /document-insights/extractions is the extractionId you use for polling, feedback, and event correlation.
Before you start
You need:
- An API key. See Authentication.
- A document, either as a
fileIdfrom the Files API or as a publicurl. - Optionally, a
hubIdif you want to route directly to a known Document Insights hub. - Optionally,
metadatafor client correlation and automatic hub routing.
Extraction lifecycle
Terminal statuses are COMPLETED, FAILED, and REJECTED.
Create an extraction from an uploaded file
Create an extraction from a URL
Use a URL when the document is already available to Plextera over HTTPS.
Use fileId when your integration already uploads files to Plextera. Use url when your source system can provide a stable HTTPS download URL.
Hub routing
You can route an extraction in two ways:
metadata keys and values must be non-empty strings. Maximum: 50 entries, 64 characters per key, and 512 characters per value.
Poll for output
Call GET /document-insights/extractions/{extractionId} until the extraction reaches a terminal status.
When status is COMPLETED, outputAvailable is true and output contains the extracted fields.
Avoid tight polling loops. Poll with a delay and stop as soon as the status is COMPLETED, FAILED, or REJECTED.
Receive extraction events
If you do not want to poll, create an event subscription for:
document-insights.extraction.completeddocument-insights.extraction.faileddocument-insights.extraction.rejected
For completed extractions, the event payload includes the same completed extraction model as GET /document-insights/extractions/{extractionId}, including output.
See Event Subscriptions for setup, headers, signatures, and retry behavior.
Submit feedback
Use feedback when a value is incorrect or the extraction needs review.
message is required and can contain up to 1024 characters. fieldId is optional; include it when the feedback applies to one extracted field.
Related reference
- API Reference - Document Insights endpoints and schemas
- Event Reference - Document Insights event payloads