wayfinder · production postmortem

The timeout wasn't the bug

Haichuan Zhou · July 2026 · mcp-ast-explorer · wayfinder

While putting a screenshot of wayfinder's deployed instance on this site, I noticed something embarrassing: every symbol question — the exact query type the system is built to answer with verified, file-and-line evidence — came back with zero verified claims. The answers were honest about it ("I can't verify where this is defined from the evidence in this packet"), which is the product working as designed. But the evidence should have been there.

Following the trace

wayfinder persists per-run trace metadata precisely for moments like this. The run record pointed straight at the failing hop:

Entry explanation degraded for BaseCommand.invoke: the AST evidence tool failed. Evidence limitation: MCP tool timed out after 8s.

The deployed config ran MCP tool calls with an 8-second timeout and a single attempt. So the question became: why does an AST lookup take more than 8 seconds?

Reading mcp-ast-explorer's server code gave the answer: every tool call rebuilt the full LibCST index from scratch. find_definition, function_signature, find_references, call_chain — each one called build_cst_index(path) with no caching. I timed it locally against pallets/click (63 Python files): ~5.4 seconds per build on an M-series laptop. On Railway's shared vCPU, comfortably past 8 seconds — and a single grounded run makes several of these calls. Every symbol question was structurally guaranteed to time out.

(A fun detour: my first test query asked about BaseCommand.invoke, and even after fixing the timeout the answer refused to locate it. That wasn't a bug — BaseCommand no longer exists on click's main branch. The honest not-found was correct. Debug with symbols that exist.)

Stopgap, then the real fix

First, stop the bleeding with config: tool timeout 8s → 30s, one attempt → two, graph-node timeout raised to cover both. Symbol questions — now asking about Command.invoke, which does exist — immediately started returning verified 3 / unverified 1: definition at src/click/core.py:1353, signature, qualified name, all labeled from AST evidence. But each grounded run took 44.7–49.0 seconds, because the index was still being rebuilt on every tool call.

The real fix is a small, boring cache in mcp-ast-explorer (v0.2.0): an in-process CstIndexCache keyed by resolved repo root, invalidated by a per-file (path, size, mtime_ns) fingerprint. Stat-ing every file is cheap next to re-parsing it, and any edit, addition, or deletion changes the fingerprint and triggers a rebuild. Locally: cold build 5.6s, warm hit 4ms.

"Deployed" is a claim — verify it

Here's the part of the story I didn't expect to write. The cache fix was implemented and I believed it was deployed. When I measured the live instance, nothing had changed. Checking the chain end to end: the code existed only as uncommitted changes in a local worktree; GitHub main didn't have it; PyPI was two versions behind; and wayfinder's Dockerfile installed the dependency from git+...@main — so even a successful rebuild had faithfully shipped the old code.

The fix for that class of failure is the same discipline as everything else in this stack: commit, push, and pin the dependency to a commit SHA in the Dockerfile. The pin doubles as a Docker layer-cache bust — bumping the SHA forces the install layer to rebuild, so "it built" and "it shipped" can't quietly diverge again.

Results, measured on production

Before any fix: every symbol question timed out — 0 verified claims, degraded answers.
Timeout raised (index still rebuilt per call): 44.7–49.0s per grounded run, verified 3 / unverified 1.
Cache deployed, cold container: 18.7s — the first AST call builds the index once and the rest of the run reuses it.
Cache warm: 5.2s per grounded run. Roughly 9× faster than the timeout-only stopgap, with identical verification labels.

Takeaways

A timeout you have to raise is a smell, not a fix. The 8s limit wasn't wrong — it was correctly refusing to wait for work that shouldn't exist. Raising it bought time to remove the work.
Trace → source → local reproduction beats guessing. The run metadata named the failing tool, the tool's source named the wasted work, and a two-line timing script confirmed it before touching production.
"Deployed" is a verifiable claim. Treat it like one: the artifact serving traffic either contains the commit or it doesn't. SHA-pinned dependencies make the answer checkable from the Dockerfile alone.
There's a pleasing symmetry in debugging a verified-claims system this way: every step of the diagnosis was itself a claim backed by evidence — a trace record, a timing measurement, a git log, a production run. That's the whole thesis of the project, applied to its own operations.

Evidence for the numbers above: run-by-run timeline and the raw production run records (verbatim GET /runs snapshot). The cache: mcp-ast-explorer commit 8126b9f. The pin: wayfinder commit b3e9b4b. Or ask my homepage chatbot — it retrieves from the projects' real docs and cites its sources.