7 Incidents in One Day: What Human + AI Code Review Actually Looks Like
Real incident data from a live development session. Not theory—actual findings with timestamps and categorization.
On January 24, 2026, during a routine code improvement session, we documented something unexpected: 7 distinct incidents caught in a single day. What emerged wasn't just a list of bugs—it was a clear pattern showing exactly where humans and AI each excel.
The Raw Data
Here's what we found, categorized by who caught it:
Caught by AI (Systematic Review): 5 Incidents
- Zero Test Coverage: No tests in entire 250K+ line codebase
- Client-Side Role Assignments: Security vulnerability—roles determined on client
- Dual Firestore Collections: Same data stored in two places
- Hardcoded Client Data: Real company name embedded in code
- Styling Documentation Gap: No documented decision on styling approach
Caught by Human (Judgment Calls): 2 Incidents
- Functionality Removal: AI accidentally deleted a working "Enter Board" button during refactoring
- Unfounded Recommendation: AI suggested migrating to Tailwind without investigating—turns out the codebase was 99% inline styles by design
The Pattern
This wasn't random. A clear pattern emerged:
AI excels at finding. Systematic review, pattern matching, exhaustive search. The AI scanned thousands of files and identified issues that would take a human days to find. Test coverage? Counted. Security patterns? Analyzed. Data architecture? Mapped.
Human excels at filtering. Both human-caught incidents were judgment errors—places where the AI "followed the rules" but missed the intent. Removing the button was technically cleaning up code. Suggesting Tailwind was technically addressing "inconsistency." But a human immediately recognized: "Wait, we need that button" and "Wait, why would we change 2,400 style blocks for no reason?"
The Unfounded Recommendation Incident
This one deserves special attention because it almost caused 10-20 hours of unnecessary work.
The AI recommended: "Standardize styling (migrate inline → Tailwind)" as a medium-priority "cleanup" task.
The human asked: "Is there a reason we chose one over the other?"
The AI investigated (for the first time) and found:
style={}(inline): 2,429 usagesclassName=(Tailwind): 11 usages
A 220:1 ratio. This wasn't inconsistency—it was a deliberate architectural decision. The AI had recommended a major pivot based on assumption, not evidence.
What We Changed
This session led to two framework improvements:
1. The Agent Constitution
We created 10 concise rules that the AI checks before every action. Rule 5 (ROOT): "No workarounds, find the real problem." Rule 6 (VALIDATE): "Test technology choices before committing."
2. The Recommendation Protocol
Before recommending any change, the AI must now:
- Investigate current state
- Understand why it's that way
- Provide evidence of the problem
- Assess trade-offs
- Label it correctly (cleanup vs. refactor vs. architecture change)
The Takeaway
The human didn't catch more incidents than the AI. The human caught different incidents—the ones that required understanding intent, context, and business value.
This is why we believe in Human + AI, not AI alone. The combination caught more issues in one day than either would have alone. And more importantly, it prevented a multi-day detour into unnecessary refactoring.
The AI is excellent at systematic analysis. The human is essential for maintaining direction. Together, they're formidable.
Appendix: All 7 Incident Reports
Each incident was formally documented with root cause analysis and prevention measures. The full reports are available in our methodology documentation for teams implementing similar Human + AI workflows.