I forget, therefore I am

On what an AI methodology has to leave behind.

An unchecked rule will outlive its premise.

If you’ve ever inherited a company wiki nobody touches, or a coding standards doc full of rules from a project that shipped four years ago, you already know the shape of this problem. The page exists. People still link to it. Nobody trusts it. And nobody has been paid to take anything out of it.

Long-running knowledge artifacts do this. Documentation. Runbooks. Design specs. Sprint policies. Lint configs. Support macros. Internal wikis. The artifacts that were supposed to help eventually become the thing in the way. The systems we build to make us smarter make us louder instead.

I’ve been watching the same thing happen to my own AI dev workflow, only faster. Fast enough to actually study while it’s happening.

The bias

I’ve been running an advisory rule corpus on a side project for a few months. The usual shape: a folder of small markdown files telling the agents things to do or avoid, and a slash command that scans recent commits and suggests new rules from observed mistakes. Three months in, sixty-two active rules. The peak was seventy-two.

A commit message I wrote a couple of weeks ago:

Over 3 weeks we shipped 48 new rules and retired 2 (24:1 ratio), 58% of rules landed without MATCH_CONTENT, and the tuning log documents repeated cases of rules firing 5-6x on unrelated edits before being retroactively narrowed.

That’s me. Diagnosing my own additive bias inside my own commit. Twenty-four to one. I know better, and I was still doing it.

The pattern is everywhere. Almost nobody talks about what they had to dismantle, because dismantling doesn’t read as productive.

The fix isn’t smarter rules. It’s structural subtraction. Making removal as routine as addition, not just aspirationally.

A worked example: 27 hours and one field name

The bug, in shape: the system was talking to a wall. The channel I had wired it to use was one the listener didn’t read. Rules fired. Logs filled. The agent kept doing what it would have done anyway.

The specifics that made the wall a wall: when I shipped the first version of the guardian hook, it returned permissionDecisionReason. The docs were ambiguous and I picked the field that sounded right. The rules were advisory, so nothing was being blocked. But the agent’s behavior wasn’t shifting in any way that suggested it had read the text I was shipping.

I noticed it first at my desk one afternoon. The methodology running, the rules firing, the logs saying so. The agent acting as if nothing was there.

Twenty-seven hours of debugging with Claude later, we converged on the diagnosis. The commit message:

permissionDecisionReason only shows in the UI — the agent model never sees it. additionalContext is injected into the model’s context, which is what we need for advisory guardrails to actually reach the agent.

The fix is one line:

-      "permissionDecisionReason": $ESCAPED+      "additionalContext": $ESCAPED

That was the bug. The system was running. The rules were all there. The model never saw any of it.

Four hours after the fix, the methodology auto-generated a rule about the bug it had just made.

Two retirements

The cleanest case I caught was a rule meant to catch a specific kind of CSS mistake. The pattern looked fine. It sat in the corpus for weeks. Listed as active in every audit I ran. It never matched anything.

The reason was environmental. The pattern used a regex feature that the tool actually doing the matching didn’t support. The tool didn’t error. The match just silently always returned false. The rule was inert, registered, counted, and giving everyone a false sense that this category of mistake was being caught.

I caught it on a tuning pass, not because anything went wrong. I was reading down the active list and stopped on this one because I couldn’t remember it ever firing. That was the whole signal: a rule I couldn’t remember seeing work. The more interesting outcome was that the methodology had to grow a new practice on top of retirement. Visibility. How do you know which of your active rules are actually doing anything? Without that, you’re maintaining a corpus you can’t see.

The second case was simpler and worse. Another rule kept warning about behavior we’d shipped a fix for. It survived a merge and went on lying for a week. Why would it. Rules don’t watch other commits. The next agent to touch that area would have been told something the codebase no longer believed.

After that, every new rule shipped with a date or a trigger that retired it.

Small change. Does most of the work.

What this does and doesn’t teach

The system has to tell on itself. The reason I caught the additive ratio is that the slash command I use scans commit messages, and I write a lot of them with the agent. There are seventy-nine learning notes in this project. Without that signal, the bias would have been invisible. The methodology I rely on to study the methodology produced the diagnosis of the methodology. I’d like to claim I noticed first. I didn’t.

The design I described to myself when I started had more layers than the system has actually used. The middle one never materialized.

The assumption I haven’t tested: this whole approach requires a continuous gardener. Someone running tuning passes, retiring stale rules, watching the corpus the way you’d watch a long-running test suite for flakes. I don’t know what happens if I step away for a month. I don’t know if a colleague picking it up would inherit a working system or a half-dead orchard. Any post that sells you set-it-and-forget-it agent methodology is selling you something that doesn’t exist yet.

A smaller claim

Subtraction doesn’t make agents smarter. They’re as smart as the model. What it does is keep the noise floor low enough that the intelligence already there can actually do its work. Every dead rule, every duplicate, every silently broken pattern is a point of friction between a capable system and the thing you want it to do.

The AI workflow makes this fast enough to study, but the pattern is older than the workflow. Most long-running systems I’ve watched fail did so quietly, because nobody was paid to take things out. We staff additions. We reward shipping. The gardener role doesn’t get budgeted, and the orchard goes feral while everyone’s busy planting.

I had to figure that out the slow way, with a 24:1 ratio sitting in my git history pointing at me. Maybe that’s the part of the genre worth publishing.

· · ·

manukapoor.ca·May 2026