AI StrategyApril 24, 20266 min read

Prompt as Configuration: Editing AI Without Touching the Code

Six months ago, "improving the AI" at our customer's engineering org meant filing a Linear ticket, waiting for a sprint to free up, getting a senior engineer to read the prompt…

MR

Marcus Rivera

Principal Engineer

@@marcusbuilds
#prompt-engineering#kavrynos#ai-ops#claude#developer-productivity

Six months ago, "improving the AI" at our customer's engineering org meant filing a Linear ticket, waiting for a sprint to free up, getting a senior engineer to read the prompt change carefully, doing a code review on a .txt file, deploying, and watching for regressions. Total turnaround for a one-line tweak: roughly four days.

This is absurd, and it is the default.

The premise of treating prompts like configuration — not like code — is that the people who know what the AI should say are usually not the people with the deploy keys. The fix is to take the prompt out of the binary and put it behind a UI that the right humans can edit, version, and roll back. We built this into KavrynOS as the Prompts Manager, and the second-order effects have been bigger than I expected.

The Problem With Prompts In Code

The original argument for keeping prompts in the codebase was correctness. If the prompt is in code, it goes through review. It cannot drift. It is versioned with the rest of the system. These are real benefits.

The argument falls apart in three places.

The reviewers are wrong. A senior engineer reviewing a prompt change is rarely the right person to evaluate whether the prompt produces good output. The PM who has been watching the agent's drafts for two weeks knows. The team lead who reads every triage post knows. The engineer reviewing the diff sees only the words, not the resulting behavior, and approves on syntax.

The cycle time is fatal. Prompt iteration wants tight feedback. You read an output, you spot the failure mode, you adjust three words, you read a new output. With prompts in code, this loop runs at sprint speed. By the time the change ships, the failure mode has been seen forty more times.

The deployment surface is wrong. Most prompt changes do not need a deploy. They need an edit. Treating every adjustment as a code change burns CI time, eats reviewer attention, and creates the impression — accurate — that the AI is fragile and risky to touch.

The conclusion is that prompts have a different lifecycle than code. Treating them as code is a category mistake.

What Prompts Manager Actually Does

In KavrynOS, every Claude prompt powering the app — PR review system prompt, KB generation instructions, MCP triage drafting prompt, chat system prompt — lives in a database row. The Prompts Manager is a UI for editing those rows.

The interface is unfussy: a category sidebar, a list of prompts in each category, and a full-screen textarea for the active prompt. Each prompt shows which feature uses it, when it was last edited, and by whom. There is a Save button and a Revert button. There is no deploy.

The result is that prompt iteration runs at human speed instead of pipeline speed.

The Failure Modes That Were Quietly Killing Us

I want to share three specific failure modes Prompts Manager fixed, because they generalize.

Voice drift in autonomous drafts. The MCP triage agent was generating ticket replies that sounded slightly off — too formal, with a hint of generic SaaS English. The fix was a four-line change to the drafting prompt: "Match the team's existing tone. Use 'we' for the team. Avoid 'reach out' and 'circle back'." That edit took twelve seconds. In the old workflow it would have taken three days.

False-positive findings in PR review. Claude was flagging style issues in a way the team found pedantic. The fix was a single sentence in the review prompt: "Skip nits unless they affect correctness." The team lead made the edit during a meeting after seeing the third pedantic finding of the morning. Output quality improved that afternoon.

KB generation noise. The KB generator was including auto-generated TypeScript types in the architecture summary, which was making the docs longer and less useful. The fix was a paragraph in the KB generation prompt explicitly listing what to ignore. A senior engineer made the edit while waiting for a build to finish.

None of these failures would have been caught by tests. They were observable only by reading the output. The fix was always small. The friction of going through code review was the entire reason they had not been fixed already.

Who Should Have The Edit Button

The right answer for most teams is: not engineers.

The people who notice prompt failures are the people who use the agent's output in their daily work. For PR review, that is the senior engineers. For triage drafts, that is the team lead or PM. For KB generation, it is the engineer who is going to read the docs.

In KavrynOS we tied this to the Auth & Roles system. Admins can edit prompts. Editors can edit prompts. Viewers cannot. The team can decide, per-prompt, who should have the right to tune it. The default for most teams ends up being: editor role for whoever owns that part of the product.

This is the inversion that matters. The person closest to the failure becomes the person who fixes it. The friction of going through code review — the friction that was the entire reason your AI was bad in the corner cases — disappears.

What This Costs

I want to flag the trade-offs.

Audit becomes harder. Code review on a prompt change at least guaranteed a second pair of eyes. With Prompts Manager, an editor can change behavior alone. We mitigate this with a full edit log — every save records who, when, and the diff — and we surface unusual changes in the admin panel. But it is genuinely less rigorous than code review, and a team that needs SOC 2 levels of change control should configure Prompts Manager accordingly.

Bad prompts can ship. A misguided edit to the PR review prompt could degrade review quality across the team for hours before someone notices. The Revert button is one click; the time-to-notice is what matters. Teams that have shipped serious prompt edits have built the habit of watching the next ten outputs after a save. This is a discipline, not a feature.

Ownership ambiguity. When a prompt produces a bad output, who is on the hook? The engineer who wrote the original prompt? The PM who edited it last? The team lead who approved the edit? Most teams settle on "whoever last edited it owns it," but this is a culture decision, not a tooling one.

These are real costs. They are also the costs of allowing fast iteration on AI behavior. The alternative — slow iteration that protects against bad edits — produces an agent that is consistently mediocre, which is worse than an agent that occasionally has a bad afternoon and is rolled back.

The Generalized Lesson

Treat prompts like configuration, not like code.

Configuration changes do not need a deploy. They need a UI. They need an edit log. They need rollback. They need permissions. They do not need pull requests.

The teams that have shipped good agents over the last year all share this property: the people who notice when the agent is wrong are also the people who can fix it, and the loop between noticing and fixing is measured in minutes. Not days.

If your team is iterating on AI behavior through Linear tickets, you are not going to ship a good agent. The cycle time is the constraint.

See the Prompts Manager in KavrynOS →

You Might Also Like