TLDR
- AI agents introduce bugs in complex codebases not because the AI is unreliable, but because they're blind to the institutional knowledge behind your decisions — decisions that carry years of context, trade-offs, and consequences the code itself never reveals
- Every codebase has two implicit audiences: the machine and the next developer — AI agents are a third, and they're already reading your code
- An agent scanning an 8-year-old codebase encounters multiple architectural eras simultaneously, with no way to distinguish the intentional from the incidental
- Architecture Decision Records (ADRs) are a proven format that maps directly onto what an agent needs: decision, rationale, constraints, and known trade-offs
- You don't need to retrofit the entire codebase — start at the boundaries between architectural eras and business-driven shortcuts
The Concern Is Valid. The Diagnosis Usually Isn't.
If you've been following the discourse around AI coding agents, you've seen the stories. Bugs introduced at scale. Subtle regressions. Code that looks right but behaves wrong under specific conditions. Production databases deleted — backups included. Well-known SaaS platforms down for hours while engineers tried to reconstruct what the agent had done and undo it.
The concern is legitimate. But the diagnosis — "AI agents are unreliable" — misses the actual failure point.
The agent isn't the problem. The missing context is.
You've Always Written Code for Two Audiences
Every technical decision your team has ever made was written for two readers: the machine that executes the code, and the developer who maintains it.
Your naming conventions, your abstractions, your module boundaries — all of it implicitly addresses one of these two audiences. Good engineering is the practice of serving both simultaneously.
AI coding agents are a third reader. They have no memory of the Slack thread where your team decided to leave that legacy module untouched. They don't know that a particular shortcut was a deliberate business call, validated by product leadership, with a known remediation plan. They can't tell which parts of the codebase were built last year and which were built during a different architectural era under completely different constraints.
When an agent scans your codebase, it reads everything at once. It has no way to distinguish the intentional from the incidental.
What Happens When the Third Reader Has No Briefing
Consider what an 8-year-old codebase looks like from an agent's perspective.
There are modules following current patterns alongside modules that haven't been touched in years. There are abstractions that exist because a vendor constraint shaped how the system grew — but that constraint is invisible in the code. There are places where the "right" architecture was deliberately skipped in favor of faster validation, with the intention to revisit later. That intention lives in someone's memory, possibly a Slack message from 2021, or nowhere at all.
The agent looks at all of this and tries to synthesize a solution. It finds the most common pattern and applies it. But the most common pattern in your codebase may be the wrong one for the specific module being edited. And the shortcut that was carefully scoped by a business decision gets extended — because the agent had no way to know it was bounded.
This is where the bugs come from. Not from AI. From absent context.
ADRs: Briefings for the Third Audience
Architecture Decision Records were proposed by Michael Nygard in 2011. The format is simple: a short, numbered document capturing a single architecturally significant decision — what was decided, the context that shaped it, the status, and the consequences.
Most teams that use ADRs think of them as documentation for future developers. That's correct. It's also no longer the whole picture.
ADRs are structurally suited to briefing an AI agent. The format maps directly onto what an agent needs to generate context-appropriate code:
- Context — the forces at play: technical, business, organizational
- Decision — what was chosen, stated clearly
- Status — whether this is still active or has been superseded
- Consequences — the known trade-offs, including the ones that were accepted and parked
That last section is the most valuable. The consequences section is where institutional knowledge lives. It's where you write: "We accepted this limitation as a business call in Q3 2022. This pattern should not be extended." An agent that has access to that statement will not extend it.
The format also handles architectural inconsistency directly. An ADR marked superseded tells an agent that the old pattern it found in the legacy module has been deliberately replaced. Without that signal, the agent has no way to know whether the old pattern is the standard or the exception.
Where to Start — Without Paralyzing the Team
The instinct when adopting any documentation practice is to retrofit it across the entire codebase. That's the wrong move.
Start at the boundaries. The places where architectural eras meet. The modules where shortcuts were taken under time pressure and then inherited by the rest of the system. The decisions that every new team member asks about in their first week.
Three well-placed ADRs in high-traffic, historically ambiguous areas are worth more than thirty ADRs on decisions that are self-evident from the code. Prioritize the decisions that are invisible — the ones that live only in the heads of people who were there.
When you onboard an AI agent into a complex system, you're not handing it a codebase. You're handing it a codebase plus whatever context you've chosen to make explicit. ADRs are the tool for making that choice deliberately, before the agent makes it for you.
The Teams Getting This Right
The teams successfully deploying AI agents into complex, long-lived codebases are not using better AI than everyone else.
They're giving their AI better material to work with.
The third audience has arrived. The engineering question is no longer whether AI agents will read your codebase — they already are. The question is whether you've written for them.
If your answer is "not yet," the place to start is the decision your new developers always ask about on their second week. Write that one down first.
Key Takeaways
- AI bugs in complex codebases are almost always a context problem, not a tool problem — diagnose before you blame the agent
- Your codebase now has a third audience: AI agents that read everything simultaneously with no institutional memory and no way to distinguish intentional decisions from incidental ones
- ADRs are not just developer documentation — they are structured briefings for an agent, and the consequences section is where the most valuable signal lives
- Start ADR adoption at architectural boundaries and historically ambiguous decisions, not across the entire codebase — three precise ADRs outperform thirty generic ones
- The teams winning with AI coding agents are investing in context infrastructure; better prompts alone will not compensate for an undocumented codebase