wayfinder · design write-up

Onboarding copilots shouldn't sound confident before they're grounded

Haichuan Zhou · July 2026 · source on GitHub · live demo

Most codebase onboarding tools have the same failure mode: they sound useful before they are grounded. They can summarize a README, list plausible modules, or explain a function name that looks familiar. What they usually can't tell you is which statements came from code evidence, which are assumptions, and which are contradicted by the tests.

wayfinder is my answer to that problem. It's a multi-agent codebase onboarding copilot built around evidence, not narration. Given a repository and a question, it routes the task through a LangGraph Supervisor, calls deterministic MCP servers for code facts, labels high-risk claims, and writes final answers that preserve uncertainty instead of hiding it.

Three agents, three kinds of evidence

The system is intentionally split into role-contracted agents, each bound to a deterministic tool I wrote as a standalone MCP server:

architect_mapper handles orientation. It calls mcp-repo-mapper, which scans file structure, language breakdown, framework signals, entry points, and the Python import graph. Its job is what the repo provably contains.
entry_explainer explains symbols and call paths via mcp-ast-explorer — LibCST-backed definitions, signatures, references, call chains, class hierarchies. If a symbol doesn't exist, the tool returns a structured not-found. wayfinder is not allowed to invent a nearby symbol just because the query sounds plausible.
verifier challenges high-risk claims. Static facts get labeled from AST evidence; runtime behavior needs executable evidence from mcp-test-runner, which only runs inside a sandboxed worker.

"Unverified" is a product state

Every claim ends up with one of three labels: verified, unverified, or contradicted. The middle label is the important one. No test coverage, an unsupported language, a timeout, malformed output — all of those become unverified, visibly. The system does not silently count uncertainty as success. And when a selected test directly conflicts with a claim, the claim becomes contradicted and the final prose is rewritten through a bounded reflection layer (capped at two rewrites, then the cap itself is disclosed).

You can see this behavior in the live demo today: ask an architecture question on a public repo and the run completes with its claim counts on display. When the evidence packet can't support a claim, the answer says exactly that — which I'd argue is the single most trust-building thing a code assistant can do.

Observability as a schema contract

The most consequential engineering decision wasn't in the agent graph — it was deciding that observability starts as a schema contract. Every run emits stable metadata keys: agent name, tool name, MCP server, tokens, latency, cost, claim id, thread id, phase. LangSmith can consume those fields when tracing is on, but the Next.js dashboard and my separate eval harness (agent-eval-harness) rely on the same shape without live credentials. That one contract is what later made it possible to benchmark this architecture against a ReAct baseline under identical metrics.

What it isn't

wayfinder is not trying to be a general code chatbot. It's a narrow workflow with a narrow promise: map the repo, explain entry paths, verify risky claims, and show where the answer is uncertain. That narrowness is the point — the interesting work isn't hidden in a prompt, it's visible in the state schema, the MCP boundaries, the verification labels, the failure-mode table, and the dashboard evidence.

Built with Python 3.11, FastAPI, LangGraph, three self-authored MCP servers, and a Next.js dashboard, deployed on Railway. Questions? Ask the chatbot on my homepage — it retrieves from this project's actual design notes and cites its sources.