Code Review in Under Two Minutes. Here's What Changed.
Most engineering teams have the same code review problem. The senior engineers are bottlenecked. The junior engineers wait. The PR sits open for two days. By the time someone…
Most engineering teams have the same code review problem. The senior engineers are bottlenecked. The junior engineers wait. The PR sits open for two days. By the time someone actually reads the diff, half the context is gone and the author has already moved on to the next ticket.
We have spent the last few months building KavrynOS — a desktop workspace that wraps JIRA, Bitbucket, and Claude into one app — and the feature that has changed how my team actually works is the AI-powered PR review. Not the marketing version. The real one. Here is what the workflow now looks like, what it found, and what it did not.
The Old Loop
Before KavrynOS, our review process for a payments service PR looked like this:
- Author opens PR. Pings reviewer in Slack.
- Reviewer opens five tabs: Bitbucket, JIRA ticket, Confluence design doc, two related repos.
- Reviewer reads diff. Misses two of the four real issues because the file is 600 lines and the eye glazes by line 400.
- Reviewer leaves a comment about variable naming. Submits review.
- Author fixes the variable name. Merges.
- Two weeks later, the missed issues become a Sev-2 incident at 11pm on a Tuesday.
The diff was reviewed. The code was not.
What Two Minutes Actually Looks Like
The new loop runs inside KavrynOS. The reviewer picks the repo, the source branch, and the destination branch. Claude reads the entire diff plus context the agent has cached from the codebase scan we ran when we onboarded the repo.
In about ninety seconds, the panel shows:
- A verdict. Approve, Request Changes, or Discussion. The verdict is a strong opinion, not a guess. Claude tends to say "Discussion" when something is technically correct but architecturally surprising.
- A findings table. Severity-ranked. Each finding includes the file path, the line number, and a one-sentence explanation of what is wrong and why.
- Inline comments. Pre-written, scoped to the exact line, ready to post.
Then there are three buttons: Post Review, Approve, Request Changes. One click posts the full review to Bitbucket as a general PR comment plus targeted inline comments. Not a wall of generic feedback. The actual structured review, with line numbers, posted as a first-class API call.
What It Caught On Real PRs
I pulled the findings from our last 80 reviews. The high-severity issues fell into a few patterns I would not have caught on a rushed read:
Idempotency on payment paths. Three separate PRs added new endpoints that processed money without an idempotency key. All three would have replayed on webhook retry. None were caught by tests.
Silent error swallowing. Eleven instances of catch (e) {} in code that needed at least a structured log. The reviewer reading the diff at 4pm on Friday is not catching these. Claude does, every time.
Timezone-sensitive date parsing. Five instances of new Date(string) where the string came from the API in UTC and was parsed as local. This is the kind of bug that ships, then bites three months later when a customer in Sydney sees the wrong day on an invoice.
Missing pagination on list endpoints. Two PRs added endpoints that returned all rows. Fine in dev. A 30-second timeout in production once the table grew.
These are not exotic findings. They are the kind of thing a careful senior reviewer should catch — but does not, when the queue has six PRs in it and the next meeting is in ten minutes.
What It Doesn't Do
Two months in, I can also tell you what Claude does not do well. It is honest to publish this list:
It does not catch architectural mistakes. If you put a new service in the wrong place, the AI sees the diff and approves the local logic. It cannot tell you that the service should not exist. That is still a senior conversation.
It does not understand "we agreed last sprint that we wouldn't do X." It only sees the code and the cached knowledge base. Tribal knowledge that lives in Slack DMs is invisible to it.
It is occasionally pedantic about style in a way that is not useful. We have tuned the prompt — the Prompts Manager is editable from inside the app — to suppress style nits unless they affect correctness.
The Re-Review Queue Is Quietly The Best Part
The feature I did not expect to use, and now use every day, is the re-review queue. When the author pushes fixes, I mark the PR for re-review. KavrynOS pre-fills a new review request with the original findings as context. Claude reads the new diff and tells me whether each prior finding has been addressed — and it flags new issues introduced by the fix, which is where the actual second incident usually hides.
I no longer have to remember what I asked for two days ago. The agent does.
Two Minutes Is The Wrong Number
The headline is two minutes. The real number is what the reviewer does with the rest of the time. Before KavrynOS, my senior engineers spent two to three hours a day on review. With the AI handling the first pass, they spend about thirty minutes a day reviewing — and the conversations they have are about architecture, trade-offs, and whether the change should exist at all. Not whether the variable should be userIds or user_ids.
That is the actual win. Not faster reviews. Better conversations.
If your team is shipping more PRs than your senior engineers can read carefully, AI review is not optional anymore. The cost of a missed bug at 11pm on a Tuesday is much higher than the cost of running Claude across the diff.