I’ve been thinking about this framing for a while, and I think it captures the fundamental architectural split in AI tooling better than anything else I’ve come up with.

There are two ways to give an AI the context it needs. The industry picked one. I think they picked wrong.

How every tool works today

The pattern is the same everywhere. Your AI tool scans your codebase — files, directory structure, maybe some git history. Stuffs as much as it can into the context window. Sends the whole thing to the LLM. Hopes the model finds the relevant parts.

Cursor calls it “codebase indexing.” Copilot calls it “code referencing.” Claude Code reads files on demand. The implementation varies, but the architecture is identical: dump everything in, let the model sort it out.

I call this the Scan approach. And it has problems I don’t think are fixable within the paradigm.

Why scanning breaks

Context windows are finite. A medium-sized project has millions of tokens of source code. You can’t fit it all. So the tool has to guess which files matter — and it guesses wrong constantly. I’ve watched tools include entire test directories when the task is about production code, or load a database migration file when the engineer is working on a frontend component.

More fundamentally: scanning is O(n). As your codebase grows, the problem gets worse. More files to index. More irrelevant context diluting the relevant parts. More tokens wasted on code the model doesn’t need for the current task.

But here’s the thing that really gets me: scanning can only see code. Your codebase contains source files. It doesn’t contain why you chose your architecture. It doesn’t contain the error pattern that burned two engineers last month. It doesn’t contain the fact that your frontend team prefers composition over inheritance, or that the one person who understands the billing pipeline just went on leave.

No amount of codebase scanning will surface this knowledge. It doesn’t live in files. It lives in conversations, decisions, and people’s heads.

The alternative I keep coming back to

What if instead of dumping everything in and hoping, you sent only what’s relevant — and gave the AI tools to find more when it needed to?

This is what I think of as the Seek approach. It works in layers:

Always-present: A small set of high-signal knowledge that matters for every interaction. Your team’s rules. The structural flows in your system. These are injected automatically because they always apply. A few hundred tokens, not thousands.

Context-aware: What the AI has learned while working in this specific context. Decisions it made. Patterns it discovered. Errors it hit. This is the AI’s working memory for the current task — and it persists across sessions.

On-demand: Everything else. The full organizational knowledge base, searchable by the AI when it needs it. Error patterns from six months ago. Team expertise maps. Deployment runbooks. The AI doesn’t carry this — it reaches for it when the task demands it.

The math that convinced me

Scan:

[200K token context window]
├── 50K: source files (maybe relevant, maybe not)
├── 30K: conversation history
├── 10K: system prompt
└── 110K: remaining capacity (shrinks every turn)

Seek:

[200K token context window]
├── 2K: rules that always apply
├── 3K: knowledge from this context
├── 10K: system prompt
└── 185K: available for actual work

The seek model uses ~96% of the context window for the current task. The scan model wastes 25-50% on context that might not be relevant.

But the efficiency difference, honestly, isn’t the most important part. The most important part is what you can represent.

What seek can surface that scan can’t

Knowledge typeIn files?In a seek system?
Current source codeYesYes (file tools)
Why you chose this architectureNoYes
Known error patternsNoYes
Team conventionsPartiallyYes
Who knows whatNoYes
Past incidentsNoYes
What was done last weekNoYes
Git history contextPartiallyYes

A scan system gives the AI your code. A seek system gives the AI your organization’s knowledge. These are fundamentally different products masquerading as the same category.

The self-priming insight

The part that took me the longest to figure out: the best source of organizational knowledge is the AI’s own conversations.

When an engineer explains to the AI why they’re choosing a particular approach, that’s a decision being made. When they discover a coupling between services while debugging, that’s an insight being created. When they fix a bug and explain the root cause, that’s an error pattern being documented.

These moments happen every day. The knowledge is right there — fresh, contextualized, structured. In a scan system, it evaporates when the session ends. In a seek system, it’s captured, stored, and available to the entire team.

No documentation sprints. No wiki maintenance. The knowledge just accumulates because people use the tool.

The compounding difference

This is the part that keeps me up at night, because I think the implications are bigger than most people realize.

Scan systems are stateless. The 1,000th session is exactly as informed as the 1st. Seek systems compound. The 1,000th session has access to everything the organization learned in the first 999.

Without compounding, your team’s effective knowledge equals the smartest person in the room. With it, your team’s effective knowledge equals the sum of everything anyone ever learned.

The difference between scan and seek isn’t a feature. It’s an architecture. And architecture is hard to change once you’ve committed.


Next: the counter-intuitive insight that made our knowledge capture work — and why a separate RAG pipeline is the wrong approach.