Document Extraction

Document extraction is the Document Insights flow for turning a document into structured fields. It is asynchronous: you create an extraction, receive an ID, and then either poll for the result or receive a webhook event.

The id returned from POST /document-insights/extractions is the extractionId you use for polling, feedback, and event correlation.

Before you start

You need:

  • An API key. See Authentication.
  • A document, either as a fileId from the Files API or as a public url.
  • Optionally, a hubId if you want to route directly to a known Document Insights hub.
  • Optionally, metadata for client correlation and automatic hub routing.

Extraction lifecycle

StatusMeaning
QUEUEDThe extraction was accepted and is waiting to be processed.
PROCESSINGDocument Insights is processing the document.
COMPLETEDExtraction finished successfully and output is available.
FAILEDProcessing started but could not complete. Check error.
REJECTEDThe document was rejected before or during validation. Check error.

Terminal statuses are COMPLETED, FAILED, and REJECTED.


Create an extraction from an uploaded file

1

Upload the document

$curl https://api.plextera.com/api/public/v1/files \
> -H "Authorization: api-key YOUR_API_KEY" \
> -F "file=@lab-result.pdf"
2

Submit the extraction

$curl -X POST https://api.plextera.com/api/public/v1/document-insights/extractions \
> -H "Authorization: api-key YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "document": {
> "fileId": "file_01JY7M4ZVX5R1P3M3Q0TA1S7ZM"
> },
> "metadata": {
> "customerDocumentId": "lab-42",
> "sourceSystem": "patient-portal"
> }
> }'
3

Store the returned ID

1{
2 "id": "69654f0bc073ef404baec649",
3 "operation": "extract",
4 "status": "QUEUED",
5 "outputAvailable": false,
6 "metadata": {
7 "customerDocumentId": "lab-42",
8 "sourceSystem": "patient-portal"
9 }
10}

Create an extraction from a URL

Use a URL when the document is already available to Plextera over HTTPS.

$curl -X POST https://api.plextera.com/api/public/v1/document-insights/extractions \
> -H "Authorization: api-key YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "document": {
> "url": "https://example.com/documents/lab-result.pdf",
> "fileName": "lab-result.pdf"
> },
> "metadata": {
> "customerDocumentId": "lab-42"
> }
> }'

Use fileId when your integration already uploads files to Plextera. Use url when your source system can provide a stable HTTPS download URL.

Hub routing

You can route an extraction in two ways:

MethodHow it works
Explicit hubSend hubId when the client knows exactly which hub should process the document.
Automatic routingOmit hubId; Plextera can use metadata and organization configuration to choose the hub.
1{
2 "document": { "fileId": "file_01JY7M4ZVX5R1P3M3Q0TA1S7ZM" },
3 "hubId": "69ccad03c7574856f010eaa5",
4 "metadata": {
5 "documentType": "lab_result",
6 "clientDocumentId": "lab-42"
7 }
8}

metadata keys and values must be non-empty strings. Maximum: 50 entries, 64 characters per key, and 512 characters per value.


Poll for output

Call GET /document-insights/extractions/{extractionId} until the extraction reaches a terminal status.

$curl https://api.plextera.com/api/public/v1/document-insights/extractions/69654f0bc073ef404baec649 \
> -H "Authorization: api-key YOUR_API_KEY"

When status is COMPLETED, outputAvailable is true and output contains the extracted fields.

1{
2 "id": "69654f0bc073ef404baec649",
3 "operation": "extract",
4 "status": "COMPLETED",
5 "outputAvailable": true,
6 "output": {
7 "fieldCount": 3,
8 "fields": [
9 {
10 "id": "field_01",
11 "name": "labName",
12 "type": "text",
13 "value": "Quest Diagnostics",
14 "metadata": {
15 "extracted": true,
16 "confidence": 1.0,
17 "page": 1,
18 "placement": { "x": 43, "y": 12, "width": 22, "height": 4 }
19 }
20 }
21 ]
22 }
23}

Avoid tight polling loops. Poll with a delay and stop as soon as the status is COMPLETED, FAILED, or REJECTED.


Receive extraction events

If you do not want to poll, create an event subscription for:

  • document-insights.extraction.completed
  • document-insights.extraction.failed
  • document-insights.extraction.rejected

For completed extractions, the event payload includes the same completed extraction model as GET /document-insights/extractions/{extractionId}, including output.

See Event Subscriptions for setup, headers, signatures, and retry behavior.


Submit feedback

Use feedback when a value is incorrect or the extraction needs review.

$curl -X POST https://api.plextera.com/api/public/v1/document-insights/extractions/69654f0bc073ef404baec649/feedback \
> -H "Authorization: api-key YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "fieldId": "field_03",
> "message": "Collection date should be 2026-04-05."
> }'

message is required and can contain up to 1024 characters. fieldId is optional; include it when the feedback applies to one extracted field.