Building production-grade insurance agents just got a major infrastructure upgrade.

We collaborated with OpenAI to help make their new Agents SDK work for insurance-grade document processing. The result: 100% page extraction across our benchmark, including 900+ page loss runs that crashed other models. Here's what that means for your team.

"Processing complex insurance documents at production quality requires both reliability and strong runtime infrastructure. OpenAI's Agents SDK has raised the bar for document extraction, even processing 900+ page loss runs that crashed other frontier models and harnesses." — Sashank Gondala, CTO, FurtherAI

Key takeaways

  • FurtherAI worked directly with OpenAI to ensure their new Agents SDK can handle the complexity of real insurance documents
  • FurtherAI customers get faster, more reliable workflows on a stronger foundation, with the same single-tenant security they already trust
  • This collaboration signals our ongoing commitment to working with leading model providers to push the boundaries of what AI can do for insurance

Why this matters for insurance teams

If you're an underwriter or claims adjuster, you already know the problem. Insurance runs on documents, and those documents are brutal. A single commercial submission can be 200 pages. A loss run from a large program can exceed 900. They're dense, inconsistent, and no two are formatted the same way.

Most AI tools struggle with this. They work fine on short, clean documents. But throw a 900-page loss run at them and they choke, skip pages, or crash entirely. That's not a minor inconvenience when you're trying to triage submissions or validate coverage details at scale.

We've spent years solving this problem at FurtherAI. We know what makes insurance documents hard, and we've built our platform specifically to handle them. But we also know that great AI for insurance doesn't happen in isolation. It requires collaboration with the teams building the underlying models.

What we did with OpenAI

OpenAI recently launched their Agents SDK, a toolkit that lets AI systems take on longer, more complex tasks with better reliability. Think of it as upgraded infrastructure for the kind of multi-step work insurance demands.

We worked closely with OpenAI's team to stress-test this infrastructure against real insurance documents. Not clean demos or curated samples. We used internal benchmarks with dense, complex loss runs, submissions, and policy documents — no customer data was shared or used in this process.

The benchmark results were clear. FurtherAI achieved 100% page extraction on loss runs exceeding 900 pages. These are the same documents that caused other models to fail outright.

For your team, that translates to fewer errors, faster turnaround, and less time spent manually checking whether the AI missed something buried on page 847.

What changes for FurtherAI customers

Your day-to-day experience with FurtherAI doesn't change dramatically. You're still running the same workflows: submission intake, underwriting triage, policy comparison, claims processing, compliance checks.

What changes is the foundation underneath. Your workflows now run on stronger infrastructure, which means three things:

  1. More reliable extraction on large documents. Loss runs, statement of values (SOVs), and supplemental packages process more consistently, even when they run into the hundreds of pages.
  2. Better performance on complex, multi-step workflows. When your AI assistant needs to ingest a submission, extract key data, validate it against guidelines, and draft a recommendation, each step now runs on a more dependable backbone.
  3. The same security you already trust. FurtherAI's single-tenant architecture hasn't changed. Your data stays isolated. This collaboration strengthens the engine; it doesn't change how we protect your information.

How this works in practice

Let's walk through a real example. Say a 200-page commercial property submission lands in your queue.

Your FurtherAI workflow picks it up automatically. It reads the application, pulls in the loss runs (let's say they're 600 pages), scans the SOV, and checks the supplementals. It extracts the data you need: property locations, TIV breakdowns, construction types, loss history.

Then it validates what it found against your underwriting guidelines. Are there pre-1972 buildings exceeding your renovation threshold? Missing vacancy data? Coverage gaps? Those get flagged with clear explanations, not just raw data dumps.

Finally, it drafts a triage recommendation. Decline, refer, or proceed, with the reasoning laid out for you to review.

The whole process used to depend on each of those steps holding together without dropping context or skipping pages. With the improved infrastructure from this collaboration, that chain is more reliable than it's ever been.

Why we're multi-model by design

FurtherAI isn't built on a single model. We work with OpenAI, Anthropic, and other providers, and we use whichever one delivers the best outcome for a given insurance workflow. A model that excels at extracting data from a 900-page loss run might not be the best choice for drafting a triage recommendation or comparing policy language. We pick the right tool for the job.

Sometimes that means going deep with the teams building the models themselves. That's what happened here with OpenAI. We brought the domain expertise: what a binder check requires, how a loss run maps to a risk profile, when a submission should be declined versus referred. They brought the infrastructure. The result was a measurable improvement in document extraction that our customers benefit from directly.

This is how we think about every model provider relationship. Our commitment is to insurance teams, not to any single lab. If a model works better for your workflow, that's the one we'll use.

Frequently asked questions

Does this change how I use FurtherAI? No. Your workflows, interface, and experience stay the same. The improvement is under the hood: stronger infrastructure powering the same tools you already use.

Is my data still secure? Yes. FurtherAI continues to operate on data-training policy. Your data remains isolated and protected. This collaboration doesn't change our security model.

What types of documents benefit most from this improvement? Large, complex documents see the biggest gains. Think loss runs exceeding 500 pages, multi-document submission packages, and SOVs with hundreds of locations.

Do I need to do anything to get these improvements? No action required. We're rolling out the improvements to FurtherAI customers now.

Can I try FurtherAI if I'm not a customer yet? Absolutely. Get in touch and we'll show you how it works on your documents.

FurtherAI is the AI workspace purpose-built for insurance. Backed by Andreessen Horowitz, Y Combinator, and Nexus Venture Partners, FurtherAI supports customers writing over $15B in premiums across all 50 states.

Ready to Go Further &
Transform Your Insurance Ops?

Reclaim your time for strategic work and let our AI Assistant handle the busywork. Schedule a demo to see how you can achieve more, faster.