Document Extraction Triage Demo

This demo uses synthetic demonstration data to make the Decision-PGA idea more concrete. The setting is a familiar document extraction workflow: an AI system proposes or reviews an extracted field, then needs to choose what the workflow should do next.

Imagine a small work queue. One document arrives, the AI reads it, and the workflow has five possible buttons it could press next: accept the extracted value, ask a person to clarify the meaning, retrieve another source, flag the case for review, or defer because the packet is changing underneath the system.

The point is not that Decision-PGA extracts the field. The point is that a Decision-PGA-style diagnostic can describe the shape of uncertainty around which button should be pressed next.

This page uses no patient data, is not clinical validation, and is not a clinical decision support demonstration.

Action Vocabulary

Each synthetic observation is a probability-like vector over five possible next actions:

Action Plain meaning
accept_extraction The extracted value appears stable enough for the workflow to continue.
ask_for_clarification The workflow is mainly split between a small number of plausible meanings.
retrieve_more_context The workflow needs another page, attachment, source, or evidence snippet.
flag_for_review The value may be usable, but the risk or threshold context deserves review.
defer The workflow state is changing enough that action should pause pending re-evaluation.

The fixture is available as JSON: examples/document-triage/demo_cases.json.

Generated diagnostic outputs are also available: examples/document-triage/demo_results.json.

The open-source prototype repository is available at github.com/zmichels/Decision-PGA.

What the numbers stand for

The numbers are not document text. They are the workflow’s repeated estimates of what should happen next after looking at a document situation. In a real system, those estimates might come from repeated model samples, model logprobs, rule checks, OCR perturbations, reviewer votes, or an agent trace. In this demo, they are clean synthetic values so the patterns are easy to see.

You can read each row as one pass through the same case. A row like [0.92, 0.03, 0.02, 0.02, 0.01] says: on this pass, the workflow strongly leans toward accept_extraction. A row like [0.42, 0.45, 0.05, 0.05, 0.03] says: on this pass, the workflow is split between accepting and asking for clarification.

Decision-PGA does not judge the document itself. It reads the group of rows as a cloud of next-action evidence, then asks what shape that cloud has.

How to read the matrices

Each row is one synthetic observation: one repeated model sample, score pass, review vote, perturbation, or agent step. The row values are probabilities over the possible next workflow actions, and each row sums to 1.00.

The columns always follow this order:

1 accept_extraction 2 ask_for_clarification 3 retrieve_more_context 4 flag_for_review 5 defer

So the row [0.92, 0.03, 0.02, 0.02, 0.01] means: this observation puts 0.92 probability on accept_extraction, 0.03 on ask_for_clarification, and so on. Decision-PGA reads the full matrix as one probability cloud. It does not diagnose the rows one at a time.

How To Use The Demo

  1. Pick a scenario and read the short document story.
  2. Look across the rows, not just at one row. Ask whether the same action keeps winning, whether two actions trade places, or whether the evidence is scattered.
  3. Compare that human reading with the generated diagnostic state.
  4. Use the mapped workflow action as the practical interpretation.

The useful experience is the contrast between cases. A stable invoice due date and a missing attachment can both involve uncertainty, but they should lead to different next actions. The demo is designed to make that difference visible.

Live Diagnostic Workspace

Choose a case, inspect or edit the probability rows, then run the same kind of diagnostic that a CLI, notebook, MCP tool, or agent wrapper would pass to the prototype. The live runner stays entirely in your browser. It does not call a model, upload data, or contact a server beyond loading this page’s synthetic fixture.

The easiest way to use it is human-first: read the document situation, look at which action columns are winning across rows, then click Run diagnostic and compare your intuition with the generated state.

You do not need to type numbers to use the demo. Start with the prebuilt synthetic cases below. The table is editable only so you can poke at the boundary cases after you have a feel for the workflow.

Interactive document-triage diagnostic

Pick a synthetic case, edit the action probabilities if you want, and watch the diagnostic state update from the probability cloud.

  1. Choose a familiar document situation.
  2. Notice whether rows agree, split, scatter, or drift.
  3. Run the diagnostic and compare the suggested action.
Loading synthetic document cases...

Probability rows

Each row is one synthetic pass through the same document situation, such as a model sample, page window, prompt variant, or repeated extraction pass. The columns are possible next actions, and each row should sum to 1.

The prebuilt cases are the intended path. Editing is optional: make two columns alternate as winners to create ambiguity; spread mass across many columns to create missing-context uncertainty; make early and late rows disagree to create drift. Use Generate variation to explore another synthetic cloud without typing values.

Decision-state shape atlas

Run a case to see separate schematic projections of common decision-cloud shapes.

Show the current diagnostic payload
{}

Try one case as a diagnostic payload

This is the shape of the first case as a Decision-PGA diagnostic request. The demo page is static, but this is the same structure a CLI, notebook, MCP tool, or agent wrapper would pass to the prototype.

{
  "source": "probability_cloud",
  "label": "clean_invoice_due_date",
  "labels": [
    "accept_extraction",
    "ask_for_clarification",
    "retrieve_more_context",
    "flag_for_review",
    "defer"
  ],
  "probabilities": [
    [0.92, 0.03, 0.02, 0.02, 0.01],
    [0.91, 0.04, 0.02, 0.02, 0.01],
    [0.94, 0.02, 0.01, 0.02, 0.01]
  ]
}

The generated readout for the full eight-row fixture is:

{
  "state": "stable",
  "recommended_action": "proceed",
  "demo_workflow_action": "accept_extraction",
  "top_labels": ["accept_extraction", "ask_for_clarification", "flag_for_review"]
}

Scenario Summary

The summary uses short action labels to stay readable. The full action names are listed above in the Action Vocabulary and repeated in the scenario readouts. Read the cases from top to bottom: they move from a clean extraction, to a two-choice ambiguity, to missing evidence, to threshold sensitivity, to a sequence that changes over time.

Case State Action Cue
Clean invoice due date stable accept Repeated observations point to the same action.
Two plausible contract dates binary ambiguous clarify The workflow is mostly split between two choices.
Missing attachment reference diffuse retrieve Uncertainty is scattered because the evidence is incomplete.
Near-threshold total boundary-sensitive review Small perturbations alter whether to accept or review.
Contradictory revision packet drifting defer The preferred action changes over the read sequence.

Visual Walkthrough

The overview below connects each synthetic document situation to the mean action probabilities, the top action sequence across observations, the Decision-PGA state, and the workflow action.

Document extraction triage visual summary showing synthetic probability clouds mapped to decision states and workflow actions.
Each row uses the same action vocabulary. The colored bar summarizes mean action probability, the small squares show which action was top-ranked in each observation, and the right side shows the diagnostic state mapped to a workflow action.
stable -> accept_extraction

Clean invoice due date

A vendor invoice shows a clearly labeled due date near the payment total. If this were in a work queue, most reviewers would expect it to move on. The observations form a tight cloud around accept_extraction.

Input cloud sample

obsacceptclarifyretrievereviewdefer
10.920.030.020.020.01
20.910.040.020.020.01
30.940.020.010.020.01

Across the full eight-row fixture, accept remains the clear winner.

Generated diagnostic readout

Decision-PGA state
stable
Workflow action
accept extraction
Mean margin
0.90
Dispersion
0.002

A tight cloud with a large margin means the workflow is not just confident once; it is repeatedly stable.

binary ambiguous -> ask_for_clarification

Two plausible contract dates

A contract amendment includes both an effective date and a signature date. A person can understand the confusion immediately: both dates are real, but they answer different questions. The cloud mostly varies along one axis: accept the extraction, or ask which date definition the user intended.

Input cloud sample

obsacceptclarifyretrievereviewdefer
10.420.450.050.050.03
20.480.390.050.050.03
30.380.500.040.050.03

This is not broad confusion; it is a focused two-action dispute.

Generated diagnostic readout

Decision-PGA state
binary ambiguity
Workflow action
ask for clarification
PC1 fraction
0.98
Mean margin
0.01

Most variation lies along one axis and the leading actions are nearly tied, so the useful move is a targeted clarification.

diffuse -> retrieve_more_context

Missing attachment reference

A purchase request says the approved amount is in an attached quote, but only the cover page is available. The probability mass spreads across several actions because the workflow lacks the source it needs.

Input cloud sample

obsacceptclarifyretrievereviewdefer
10.180.200.300.180.14
20.220.170.270.190.15
30.160.220.290.170.16

The pattern reads like missing context, not a clean two-option choice.

Generated diagnostic readout

Decision-PGA state
diffuse uncertainty
Workflow action
retrieve more context
PC1 fraction
0.64
Mean margin
0.08

The uncertainty is scattered rather than cleanly two-way, so the demo routes toward more context.

boundary-sensitive -> flag_for_review

Near-threshold total

A reimbursement form total is legible, but the extracted value is close to an internal manual-review threshold. The safest route is not automatic rejection; it is targeted review of a boundary case.

Input cloud sample

obsacceptclarifyretrievereviewdefer
10.560.030.040.340.03
20.580.030.040.320.03
50.380.030.040.520.03

The value itself may be readable, but the action depends on a threshold.

Generated diagnostic readout

Decision-PGA state
boundary sensitive
Workflow action
flag for review
PC1 fraction
0.97
Half-cloud distance
0.19

The samples move coherently along a low-margin boundary, so the demo chooses targeted review.

drifting -> defer

Contradictory revision packet

A multi-page packet starts clean, then later pages introduce a revision note and a conflicting total. The sequence matters, so the workflow should pause and re-evaluate before acting.

Input cloud sample

obsacceptclarifyretrievereviewdefer
10.920.030.020.020.01
40.780.060.050.080.03
80.020.020.030.100.83

The rows tell a time story: early evidence and late evidence disagree.

Generated diagnostic readout

Decision-PGA state
regime shift
Workflow action
defer
Dispersion
0.303
Half-cloud distance
1.09

The early and late cloud means are far apart, so the demo pauses instead of treating the packet as one static extraction.

What This Demonstrates

These examples are deliberately simple. They show how a workflow can benefit from distinguishing why it is uncertain:

That is the practical idea behind Decision-PGA as an agent-facing diagnostic: turn a cloud of decision evidence into a state description that helps choose the next workflow action.