Tracing AMQP across 47 repositories without touching production.

When we tell people ArchMellon detects communication edges from source code alone, the most common response is a polite, slightly skeptical nod. The skepticism is fair. Of course you can find http.Post calls. The hard part isn't the calls — it's resolving them: which repository owns the handler, which routing key they actually publish to, whether the consumer over in payment-svc still listens to that key after last quarter's refactor.

This post walks through how we solved that problem. It is not a victory lap. We rewrote the resolver three times, abandoned a perfectly nice type-flow analysis halfway through, and only got it deterministic after a lunch conversation with someone who'd built a similar thing for Erlang in 2015. None of this was on the original Linear ticket.

The problem, concretely

Take the most common pattern we see: a producer calls mesh.SendCommand("process_payment", ...) in order-svc, and somewhere across 47 repositories there's a consumer wired to that string. From the call site alone, you have a literal — a string — and a function name. You don't have:

A guarantee that the string is actually a routing key (it could be a metric name, a feature flag, anything).
A mapping from that key to the consuming service. That mapping lives in whatever AMQP topology the team has set up — possibly in YAML, possibly in code, possibly in someone's head.
Any way to tell whether the consumer is currently active. The consumer could have been deleted last week.

Runtime tracing solves this beautifully — until you realize it can't see cold paths. If process_payment only fires on a Tuesday, and you sample on a Wednesday, the edge is invisible. We needed a static answer.

The constraint that shaped everythingSource code never leaves the customer's network. Whatever resolver we built had to fit in a single binary, run on commodity hardware, and complete a full reindex of a 47-repo monorepo in under five minutes.

Approach v1: type-flow analysis (what we abandoned)

The first instinct of any compiler person is type-flow. Trace the literal "process_payment" through the AST, follow it into SendCommand's parameters, and reach the AMQP client's internal routing table. It's a beautiful problem on a whiteboard.

It's a nightmare in practice. Half the routing keys are constructed via fmt.Sprintf. Another quarter come from environment variables. The remaining quarter come from a constants file that's auto-generated from a YAML schema in a sibling repo. Type-flow analysis can chase the first quarter and gets nothing useful for the rest.

We spent four weeks on this. The tipping point was a Slack thread where Dmitri pointed out that we had the wrong abstraction. We weren't trying to analyze the program. We were trying to describe it — at the level of conventions teams already use. That meant the right primitive wasn't a static analyzer. It was a plugin.

We weren't trying to analyze the program. We were trying to describe it — at the level of conventions teams already use. — Slack thread, week 5

Approach v2: WASM plugins as a contract

The shape we landed on: ArchMellon parses the AST and offers it to a sandboxed WebAssembly plugin via a small set of host functions. The plugin gets to visit nodes — function calls, struct literals, constant declarations — and emit edges. We ship system plugins for AMQP, gRPC, Kafka, NATS, and HTTP. Customers can write their own for in-house conventions.

The contract is intentionally narrow:

comm-detector-pdk/src/lib.rs

pub trait CommDetector {
    // Called for every function-like node in the AST.
    fn visit_call(&mut self, ctx: &CallContext) -> Vec<Edge>;

    // Called once per repository — for cross-call state.
    fn visit_module(&mut self, ctx: &ModuleContext) -> Vec<Edge>;

    // Confidence — used by the resolver downstream.
    fn confidence(&self, edge: &Edge) -> f32;
}

That's it. Three methods. The plugin author writes whatever pattern-matching they want against the AST cursor, emits edges with a confidence score, and we handle the rest — sandboxing, signing, indexing, cross-repo resolution.

A concrete example: AMQP detection

Here's roughly what our system AMQP plugin does when it visits a call node:

amqp-detector/src/visit.rs

fn visit_call(&mut self, ctx: &CallContext) -> Vec<Edge> {
    let Some(method) = ctx.method_name() else { return vec![]; };
    if !AMQP_PUBLISH_METHODS.contains(method) { return vec![]; }

    let routing_key = ctx.first_arg().resolve_string()
        .unwrap_or_else(|| ctx.heuristic_key());

    let target = self.topology
        .lookup(&routing_key)
        .unwrap_or("<unresolved>");

    vec![Edge {
        from: ctx.repository().to_owned(),
        to: target.into(),
        protocol: "AMQP".into(),
        routing_key,
        confidence: if ctx.first_arg().is_string_literal() { 0.97 } else { 0.65 },
    }]
}

Two things to notice. First, the confidence score isn't a vibe — it's a function of how deterministically we resolved the routing key. Literals get 0.97. Heuristics get 0.65. Below 0.4, we surface the edge as "suspected" in the UI rather than asserting it.

Cross-repo resolution: the lunch conversation

The naive approach is to scan every repository for AMQP consumer registrations, build a global table from routing key to service, and look up producers against it. This works fine on toy datasets. On 47 real repositories with overlapping naming conventions, it falls apart. Three different teams had the routing key "order.created" — one was a domain event, one was a metric, one was a Kafka topic name accidentally reused.

The fix: don't resolve at lookup time. Resolve at index time, but keep the ambiguity as a graph property. We promote the routing key from a string to a scoped identifier — namespace + key + protocol — and let the resolver fan out.

What we learned

Three takeaways for anyone building static comm detection:

Don't fight the conventions; describe them. Every team has a slightly weird messaging idiom. A plugin contract beats a smarter universal analyzer.
Confidence is a first-class output. "We saw this edge with 0.65 confidence" is more useful than a binary present/absent.
Resolve at index time, not query time. Push the ambiguity into graph structure rather than into the resolver's runtime cost.

⌥

If you're trying thisThe plugin SDK lives at comm-detector-pdk and works in any language with extism-pdk support.

What's next

We're working on extending the plugin contract to support stateful detectors — patterns that need to maintain state across calls. Plus Java and C# language support.

If you've built something similar and want to compare notes — or have a messaging convention we don't handle yet — drop us a line. We're listening.

Tracing AMQP across 47 repositories without touching production.

The problem, concretely

Approach v1: type-flow analysis (what we abandoned)

Approach v2: WASM plugins as a contract

A concrete example: AMQP detection

Cross-repo resolution: the lunch conversation

What we learned

What's next

Try this on your repos.

Elena Markov

Tracing AMQP across 47 repositories without touching production.

The problem, concretely

Approach v1: type-flow analysis (what we abandoned)

Approach v2: WASM plugins as a contract

A concrete example: AMQP detection

Cross-repo resolution: the lunch conversation

What we learned

What's next

Try this on your repos.

Elena Markov

Keep reading

The third path: why YAML catalogues rot and runtime tracing misses the point.

Why we ripped out our recursive AST walker (twice).

1.7.0 — Rust support, faster blast-radius queries, and per-org isolation.