How Extraction Agents Work in Parsewise

An extraction agent in Parsewise is a configurable unit of work that reads every document in a corpus, extracts relevant facts, runs multi-step analyses, and builds its own working memory. Unlike template-based extraction, agents operate with natural-language instructions and adapt to different document types without pre-defined schemas.

This article explains how agents are structured, how they process document packages at scale, and how they compare to alternative extraction approaches.

Why Extraction Agents Matter

Enterprise document work involves recurring analytical tasks: extract financial KPIs from a data room, identify coverage exclusions across a policy portfolio, reconcile reserve figures from multiple loss runs. These tasks share a common pattern. The logic is specific enough to be defined precisely, but the documents are too varied for rigid templates.

Traditional extraction tools force a choice. Template-based systems (OCR + rules, IDP platforms) require per-document-type configuration and break when layouts change. General-purpose LLMs accept any input but produce unstructured, non-deterministic outputs with no persistence between sessions.

Extraction agents occupy the middle ground. They accept natural-language definitions of what to extract, apply that logic exhaustively across a full document package, and produce structured, traceable outputs. The same agent runs consistently across different document formats, projects, and time periods.

How Agents Are Configured

Each extraction agent is defined by three components:

Topics define the subject areas an agent covers. A topic scopes the agent to a specific domain of information within the corpus. For example, an agent analyzing insurance claims might have topics for claim status, financial exposure, litigation status, and risk indicators.

Dimensions are the specific data points to extract within each topic. Dimensions define the columns of the structured output. For a financial exposure topic, dimensions might include paid amounts, incurred amounts, reserve values, and outstanding estimates.

Natural-language instructions tell the agent how to interpret, validate, and reconcile the data it finds. Instructions can encode business rules (“flag any reserve movement exceeding 20% quarter-over-quarter”), define validation logic (“cross-check EBITDA against the income statement and the CIM”), or specify handling for edge cases (“when multiple TPA reports cover the same claim, use the most recent valuation date”).

This configuration model means agents require no pre-built templates, no document-type-specific training, and no rigid field mappings. The same agent processes PDFs, spreadsheets, Word documents, scanned images, and presentations through a unified pipeline.

Creating Agents: Conversational and Programmatic

Agents can be created through two interfaces that share the same underlying engine.

Navi (Conversational)

Navi is Parsewise’s conversational workspace. Users describe what they want to analyze in plain language, and Navi proposes, creates, and executes specialized extraction agents. The user retains full control to review and modify the agent configuration before execution.

A concrete example: an insurance claims analyst uploads hundreds of claims documents and asks Navi to flag the three financially or legally riskiest open claims. Navi creates agents for claim status, reserve amounts, litigation status, and risk indicators, then synthesizes results into a structured table with citations linking to the original documents.

This conversational workflow removes the technical barrier to agent creation. Domain experts (underwriters, analysts, compliance officers) define extraction logic in their own terms without requiring engineering support.

API (Programmatic)

The Parsewise API exposes RESTful endpoints for creating, configuring, and executing agents programmatically. Technical teams use the API to embed Parsewise extraction into automated workflows: mortgage application processing, dossier submissions, regulatory reporting pipelines, and similar production use cases.

The API accepts the same agent configuration (topics, dimensions, instructions) as the conversational interface. Outputs are returned as structured JSON with schema-based extraction and full source attribution. Webhook notifications provide callbacks for extraction completion, failures, and detected inconsistencies.

For a full overview of API capabilities and integration patterns, see The Parsewise API.

How Agents Process Documents

When an agent executes, the Parsewise Data Engine (PDE) orchestrates the work across the full document corpus. The process operates at an architecture level distinct from single-document extraction tools.

Document parsing. The engine breaks each document into subsections, contextually parsing each section based on content type (prose, tables, forms, figures, handwritten content). This preserves structure, reading order, and spatial layout regardless of input format.
Parallel extraction. The engine routes work across multiple LLM providers in real time, extracting entities in parallel across thousands of pages. Each agent’s topics and dimensions direct what the engine looks for; the natural-language instructions guide how it interprets what it finds.
Cross-document reasoning. Rather than processing documents in isolation, agents model relationships across the entire corpus simultaneously. When the same entity (a company name, a financial metric, a policy number) appears in multiple documents, the engine links those references, detects contradictions, and resolves duplicates. For a deeper treatment, see Cross-Document Reasoning: How Parsewise Links Entities Across Thousands of Pages.
Resolution and structuring. The engine deduplicates extracted entities, resolves conflicts according to the agent’s instructions, and produces structured, auditable output. Every extracted value is linked back to its source document, page, and word-level bounding box.

The engine supports over 25,000 pages per run, autonomous runs exceeding 5 hours, and over 20,000 requests per minute. These are production characteristics, not theoretical limits.

Reusability, Versioning, and Persistence

Extraction agents are persistent objects, not disposable prompts.

Reusability. An agent configured for one project can be applied to other projects containing similar document types. An agent built for quarterly loss run reconciliation runs unchanged against the next quarter’s data. Agents can be shared across teams within an organization.

Versioning. Agent configurations are versioned. When business rules change (a new regulatory requirement, an updated underwriting guideline), the agent is updated and the version history is preserved. Previous versions remain available for audit and comparison.

Continuous improvement. Parsewise applies reinforcement learning from user interactions to improve extraction quality over time. When users correct an extraction, accept a suggestion, or refine an agent’s instructions, those signals feed back into the system. This captures domain-specific preferences that static models miss.

Inline editing. The platform supports inline agent editing without page navigation. Users can modify topics, dimensions, and instructions directly within the extraction workflow and re-run the agent immediately. The UX is designed so domain experts spend time on their decisions, not on operating software.

Comparison to Alternative Approaches

Capability	Template-Based IDP	LLM API + Prompts	RAG + Structured Output	Parsewise Extraction Agents
Configuration method	Per-document-type templates	Ad hoc prompts per call	Prompt per query	Topics, dimensions, natural-language instructions
Reusability	Template-locked to layout	No persistence between calls	No persistence between calls	Persistent, versioned, shareable
Document type flexibility	Breaks on layout changes	Accepts any input	Accepts any input	Accepts any input; structured output
Cross-document reasoning	Not supported	Not supported (single-call scope)	Top-K retrieval only	Native; exhaustive across corpus
Exhaustive processing	Per-document only	Context window limited	Retrieval drops long tail	Every page processed
Traceability	Field-level	None	Chunk-level at best	Word-level bounding boxes
Scale	High throughput, per-document	Cost scales linearly per document	Depends on retrieval infrastructure	>25,000 pages per run
Learning from feedback	Requires retraining	Prompt tuning only	Requires custom pipelines	Reinforcement learning from user interactions

Template-based IDP

Intelligent Document Processing (IDP) platforms like Hyperscience and Instabase require pre-configured templates for each document type. When a new layout arrives (a different carrier’s loss run, a new lender’s income verification form), a new template must be created and validated. Parsewise agents handle layout variation natively because extraction logic is defined at the semantic level (what information to find), not the structural level (where on the page to look).

LLM APIs with structured outputs

Calling an LLM API with a schema and a document works for single-document extraction. It does not scale to corpus-level work. You cannot fit a real document package in one call. Cost scales linearly per document. Outputs are non-deterministic across calls. There is no entity linking across calls, no contradiction detection, and no persistent extraction logic. The orchestration layer you would need to build on top is exactly what Parsewise provides.

RAG with extraction layers

RAG retrieves a subset of the corpus per query via Top-K similarity search. Adding structured extraction on top narrows some gaps, but the fundamental constraint remains: documents that fall outside the retrieval window are never processed. For risk-grade decisions requiring exhaustive coverage, this is a class of silent failure. See Why RAG Fails for Risk-Grade Decisions for a detailed analysis.

Practical Examples

Insurance claims triage. An agent with topics for claim status, financial exposure, litigation risk, and severity indicators processes hundreds of claims files. It extracts reserve amounts, flags adverse trends, builds event timelines across treatment and litigation records, and produces a structured risk table with citations to every source document.

Data room diligence. An agent extracts and validates KPIs (IRR, revenue multiples, EBITDA) across all deal materials in a data room. It detects when a revenue figure in the CIM conflicts with the underlying financial statements and flags the discrepancy with source attribution for analyst review.

Mortgage underwriting. An agent extracts income, expenditure, and asset data from tax returns, bank statements, and employment records. It links income declarations to supporting documents and flags missing documentation or inconsistent figures before the file reaches the underwriter.

Ready to see Parsewise in action? Request a demo or contact sales to discuss your use case.