This website uses cookies

Read our Privacy policy and Terms of use for more information.

Decagon has announced Duet Autopilot, which it describes as the first AI agent that can identify, fix and verify its own performance gaps in customer experience without manual intervention. The platform identifies performance gaps in deployed agents, generates and tests fixes, then presents validated updates to human teams for review. Accompanying the launch is DuetBench, which Decagon claims is the first benchmark for evaluating agent self-improvement end to end. Tested against it, Autopilot completed 93 per cent of diagnostic tasks, “exceeding the average human score”.

What is Autopilot and Where Did it Come From?

The announcement builds on a clear development arc. Decagon launched Duet in March 2026 as an AI partner that analyses transcripts, flags performance gaps and suggests workflow updates, cutting what had previously taken days of manual work down to minutes.

In April, Decagon published its rationale for building Duet on Agent Operating Procedures (AOPs), its system for defining agent behaviour in natural language, and confirmed that Duet had already been used to improve its own AOPs since launch.

Autopilot is the latest step. Rather than surfacing suggested improvements for teams to implement, it generates, tests and validates changes autonomously before presenting a versioned update for human sign-off.

Before Autopilot runs, teams define the parameters it must work within, covering tone, editorial standards, policy constraints and any workflows that should remain off-limits. Every proposed change is generated within those guardrails, tested against the original conversation that surfaced the issue and a curated set of hundreds of conversations representing the breadth of real customer intents. If a fix inadvertently breaks behaviour elsewhere, Autopilot identifies the regression and iterates until the update holds across both test sets.

Alan Yiu, VP of Product at Decagon, said: "Autopilot is a shift from building agents by hand to managing agents that improve themselves. Teams set the direction and review the work; Autopilot handles the diagnosing, testing, and editing that used to consume their week. Every fix compounds, which ultimately empowers businesses to provide their customers with a 24/7 AI concierge that gets measurably better with every interaction."

From Automation to Autonomous Optimisation

Customer service AI has spent several years proving its value in automating individual interactions, handling queries, resolving tickets and reducing escalation rates. The harder challenge is what comes after the initial build. Maintaining and improving agents as customer behaviour shifts and new edge cases emerge has remained largely a manual task, relying on QA processes, manual reviews and periodic workflow updates. Self-improving systems could compress those improvement cycles significantly, with each validated fix building on the last rather than starting fresh.

Matt McCollum, senior manager of customer experience at Opendoor, is among the early design partners testing the platform: "At our scale, manually reviewing conversations for errors isn't an option. Decagon Autopilot frees our team to focus on decisions rather than digging through logs. It surfaces what changed, what was considered, and why. That transparency is what makes AI actually trustworthy in production."

Decagon is validating Autopilot with enterprise customers across financial services, retail and consumer technology, measuring its effect on resolution rates, escalation rates and coverage.

The Governance Question

Self-improvement introduces risks that will concern CX and compliance teams in equal measure. As AI governance in customer experience becomes a more pressing concern across the industry, Decagon's staged approval model requires human sign-off before any change reaches production. Reviewers receive a complete picture of what was found, what was changed and why, including test outcomes and a line-by-line breakdown of proposed edits. The DuetBench benchmark adds a further layer, providing a formal basis for evaluating whether self-improvement is working and in what direction. As agentic systems take on more of the operational cycle, the ability to audit automated decisions may become as important as the intelligence behind them.

A Broader Industry Shift

Agentic AI in customer experience has been gathering momentum throughout 2026, with the market moving beyond the copilot model towards systems that can operate and manage themselves across complex workflows. The ability to maintain and improve agents without constant human intervention is emerging as a meaningful point of differentiation, and autonomous optimisation could yet become a standard expectation of enterprise CX platforms.

Vendors are already competing to own the agent layer, the infrastructure used to orchestrate and govern AI agents at scale. Autonomous optimisation may add a new front to that contest. The battleground is set to expand from which platform best manages agents to include which platform best improves them over time.

Keep Reading