I was looking at a competitor’s docs the other day — I won’t say which one — and I counted. They had fifteen named agents. A code review agent. A testing agent. A documentation agent. A deployment agent. A “planning” agent. Each with its own configuration, its own persona, its own set of hardcoded instructions.
And I thought: isn’t this just fifteen config files wearing trench coats?
I can say this with some authority, because six months ago we were doing the same thing.
We built the agent zoo
Back in December, we had it all. A review agent with a carefully crafted system prompt explaining our review standards. A testing agent that knew about our Vitest setup and coverage thresholds. A deployment agent with step-by-step instructions for our staging pipeline. Each one had its own prompt file — handwritten, lovingly maintained, version-controlled.
I remember feeling productive. We’d built this sophisticated system. An engineer would say “review this PR” and the router would dispatch it to the review agent, which had two pages of instructions about what to check. It worked. For about six weeks.
Then the problems started.
The agent explosion is a confession
The first crack was subtle. Someone updated our ESLint config to add a new rule. The review agent didn’t know about it. It approved code that CI rejected. No big deal — someone updated the review agent’s prompt. But then the testing agent was still referencing Jest in its examples, even though we’d migrated to Vitest in March. And the deployment agent was describing a Docker build step we’d replaced with a multi-stage build two sprints ago.
Each agent was a snapshot of the team’s knowledge at the moment someone wrote its prompt. The team kept moving. The agents didn’t.
When a product ships fifteen specialized agents, what it’s actually telling you is: “Our AI can’t figure out the right approach from context, so we need you to pick the right mode.” Think about what a “review agent” does differently from a “coding agent.” It reads code. It reasons about it. It produces output. The difference isn’t capability — it’s which files it reads, what it’s looking for, and what format the output takes. These are context differences, not capability differences.
The agent for each task only exists because the AI doesn’t know enough to handle the task with its general intelligence. It’s a crutch for missing knowledge. We just didn’t see it yet.
Slash commands are a step backwards
We compounded the problem with slash commands. /review. /test. /deploy. /explain. It felt natural — developers love command-line interfaces. But we’d spent fifty years building natural language understanding, and our answer to “make AI tools easier” was… a command-line interface?
I watched a junior engineer try to use the tool. They typed “can you review this code?” Nothing happened — the tool was waiting for /review. They didn’t know the command existed. So they googled it, found the docs, learned the syntax, and then typed the same request in a less natural format.
That moment stuck with me. We’d built something that required engineers to learn our vocabulary before they could use it. That’s not AI-augmented development. That’s a chatbot with macros.
The maintenance tax
Here’s the thing about specialized agents that nobody warns you about: the maintenance burden multiplies.
We had N agents times M rules. The review agent had review standards. The testing agent had testing conventions. The deployment agent had deployment steps. When a convention changed, we had to find every agent that referenced it. Sometimes we’d update one and miss another. The agents would contradict each other — the review agent would flag something the coding agent had just generated.
An engineering manager I talked to described the same experience: “We set up all these agents, and for the first month they were great. Then people started complaining that the review agent was enforcing standards we’d deprecated. Nobody remembered to update it.”
It’s the same config-file staleness problem, except now you have fifteen config files and they need to stay in sync with each other.
And then you need an orchestrator
Here’s where it gets truly absurd. Once you have fifteen agents, you need something to coordinate them. An orchestrator. A meta-agent. A “planner” that decides which agent to invoke, in what order, with what inputs.
We built one. Of course we did. It was an LLM call that read the user’s request, classified it, and dispatched it to the right specialized agent. We were spending tokens to decide which agent to use before we spent tokens actually doing the work.
Then we needed failure handling — what happens when the orchestrator picks the wrong agent? Handoff logic — how does context transfer between agents in a multi-step task? Routing rules — the deployment task needs the testing agent first, then the build agent, then the deploy agent, in that order, with error handling at each step.
We started with “I want the AI to review my code” and ended up building a distributed system. The orchestrator had its own bugs. The routing logic had edge cases. Someone had to debug why the orchestrator sent a refactoring request to the documentation agent.
This was the moment I started questioning everything. When your solution to “the AI doesn’t know enough” is an increasingly complex system of specialized AIs managed by a coordinating AI — you’re building Rube Goldberg machines. The complexity isn’t solving the problem. It’s hiding it behind more layers.
What actually replaced all of it
We ripped it out. All of it. The agents, the orchestrator, the prompt files, the routing logic. It took about two weeks and it was terrifying.
What replaced it was almost embarrassingly simple: one model, with knowledge.
Instead of a review agent with hardcoded review instructions, we store review conventions as organizational knowledge — structured items that update when the team’s practices change. Instead of a testing agent with a Vitest template, we have testing patterns learned from how the team actually writes tests. Instead of a deployment agent with step-by-step instructions, we have a pipeline stored as a knowledge item — an “atom” with steps that the AI discovers when you say “deploy to staging.”
The critical difference: the agents were snapshots. Someone wrote the review agent’s prompt in January. By June it was enforcing January’s standards. The knowledge items are living. When the team migrates from Jest to Vitest, the conversation where they discuss it produces a structured knowledge update. Every future session knows about Vitest. No one updates a prompt file. No one remembers to sync fifteen configs.
And complex workflows? A multi-agent system chains five agents together — test-runner, coverage-checker, build-agent, deploy-agent, health-verifier. When the pipeline changes, you update five configs. We store the pipeline as a single knowledge item. When step 3 fails, the AI knows how to handle it because it has context from the last time the Docker build failed — last Thursday, when Sarah fixed it by bumping the base image. A hardcoded orchestration framework calls a generic failure handler. The AI improvises with knowledge from actual prior failures.
The simplicity test
Here’s the test I use now for any AI development tool: can a new engineer start using it by typing what they want in plain English?
If they need to learn slash commands, configure agents, set up skills, or understand the orchestration model — the tool is pushing its complexity onto the user. That complexity exists because the tool doesn’t know enough to handle the request naturally.
I keep thinking about that junior engineer who typed “can you review this code?” and got nothing. In a knowledge-driven system, that sentence is enough. The AI knows what “review” means for this team. It knows the standards. It knows the conventions. It knows the last three things the team changed about their review process. No command required. No routing. No orchestrator.
We built the zoo. We maintained the zoo. And then we realized the zoo was the problem.
One model that remembers beats forty-seven that don’t.