The Three-Layer Pattern for AI in Production Operations
TL;DR
Building on the side? The three-layer AI ops pattern (interface + memory + reasoning) cut triage time from 15-30 min to ~30 sec. Open source, MIT licensed. Repo: https://github.com/ameno-/nano-sre
I’ve been responding to production incidents for over a decade. Most of that time on engineering teams with on-call rotations, runbooks, and incident commanders. I know what good triage looks like when you have the resources.
Lately I’ve been building something on the side as a learning exercise — a small project with real users and real infrastructure. It’s been a great way to explore some ideas I’ve had about AI-assisted operations.
That constraint forced me to look at incident triage differently. The bottleneck wasn’t my monitoring tools. It wasn’t my experience — I’ve handled hundreds of production incidents across my career. The bottleneck was the gap between receiving an alert and having enough assembled context to make a decision.
Every tool I tried to close that gap failed for the same structural reason. They coupled things that needed to be independent.
That realization became NANO-SRE, an open-source incident triage stack. But the tool is secondary. The pattern underneath it — a three-layer separation for AI-augmented operations — is what I think matters, and it generalizes well beyond my situation.
The triage problem when you’re the entire team
I’ve been on both sides of incident response. On a team with an incident commander, a dedicated communicator, and engineers who can split up to investigate different hypotheses simultaneously — triage is a coordination problem. Without that support, it becomes a cognitive load problem.
Four failure modes that I kept running into:
Alert storm correlation failure. One root cause triggers 50+ downstream alerts. I understand the service topology because I built it, but understanding doesn’t help when I’m wading through a wall of notifications trying to isolate the signal. According to research from incident.io, teams receive over 2,000 alerts weekly, with only 3% needing immediate action.
No incident memory. “Didn’t I see this exact pattern three weeks ago?” That question comes up constantly because incident knowledge lives in my head, not my systems. On teams, we had post-mortems, runbooks, Jira tickets. When you’re also building features, handling support, and shipping updates, the documentation discipline slips. The 2025 Catchpoint SRE Report found overloaded teams see MTTR stretch to roughly 4 hours. Without institutional memory, you’re compounding that problem every time a known pattern recurs.
Context fragmentation. Good triage requires metrics (Grafana), errors (Sentry), logs (Loki), and deployment history. On a team, different people pull different sources. Without that, it’s six browser tabs and zero correlation. You become the integration layer between your own tools.
Decision paralysis under pressure. Rollback or hotfix? Config change or code fix? On a team, you can talk through the trade-offs. Without that support, it’s just you, incomplete data, and time pressure. I have years of experience making these calls, but experience doesn’t eliminate the cognitive cost of making those decisions without anyone to sanity-check you.
Everything I tried first (and why it failed)
I’ve shipped production systems for a long time. I’m not easily impressed by tooling demos. So when I tried to build AI-assisted triage, I started with the obvious approaches — and learned why they’re obvious but wrong:
| Approach | What Happened | What It Taught Me |
|---|---|---|
| Claude in Slack | Hit context limits after three correlated alerts. Useful for straightforward questions, completely broke down during multi-service triage with interleaved log data. | The conversational interface and the reasoning engine have fundamentally different context requirements. Separate them. |
| Vector database for logs | Semantic search took 15 seconds per query. Technically interesting, but when production is down and users are affected, 15 seconds feels like an eternity. | Triage requires speed over precision. Fast correlation beats perfect recall every time. |
| Monolith chatbot | Every new provider integration (Sentry, then Grafana, then Loki) broke something upstream. Classic coupling problem — the kind I’d never tolerate in production code but somehow accepted in tooling. | AI-augmented workflows need the same separation of concerns that makes any well-architected system work. |
Each failure refined the architecture. By the third iteration, the pattern was clear: three independent layers, loosely coupled, each optimized for its specific job.
The Three-Layer AI Operations Pattern
This is the architectural insight at the heart of NANO-SRE. It’s not novel in the way that a new algorithm is novel — it’s the application of separation of concerns to a domain where people keep building monoliths instead.
Layer 1: Conversational Interface
Component: Nanobot (Telegram-first)
The interface layer handles notification, routing, and human interaction. It needs to be fast, mobile-native, and zero-friction. This layer should be thin. It doesn’t need to be smart. It needs to get information in front of me quickly and let me act on it.
Why Telegram? Because I’m not always at my desk. When an incident hits, I need to assess the situation from wherever I am. Channel routing keeps incidents organized. Periodic summaries let me catch up asynchronously. Zero training required — which matters because “training” implies a team, and I don’t have one.
In practice, the vast majority of my initial incident responses happen from a phone. If you can’t triage from your phone, the tool fails exactly when it matters most.
Layer 2: Data Aggregation & Memory
Component: Keep (open-source AIOps platform)
The data layer handles alert aggregation, persistent history, and automatic enrichment. Keep unifies alerts from Sentry, Grafana, Loki, and custom webhooks into a single source of truth. It correlates with recent deploys and previous incident patterns. Incident knowledge survives service restarts.
This is the layer I wish I’d had years ago on actual SRE teams. Institutional memory that doesn’t depend on anyone remembering to write a post-mortem. It’s the difference between diagnosing a known issue in seconds and burning thirty minutes rediscovering something you already solved.
Layer 3: Deep Reasoning Engine
Component: Custom triage harness (built on Pi Agent SDK)
The reasoning layer handles the cognitive heavy lifting: cross-service correlation, pattern matching against historical incidents, and generating structured verdicts with severity, category, confidence, and suggested actions. It produces HTML artifacts — portable incident reports that serve as automatic documentation.
Unlike a chatbot constrained by message-length limits, the harness maintains extended context for long triage sessions. It catches patterns I’d miss under pressure — not because I lack the experience to spot them, but because no one can hold six data sources in working memory simultaneously while also deciding what to do about them.
Why the separation matters
Each layer evolves independently. Swap models — Claude to Gemini to a local model — without touching the chat interface. Add alert providers without changing reasoning logic. Replace Telegram with Slack (on the roadmap) without rebuilding enrichment or analysis.
I applied the same principle to this stack that I’d apply to any production system I’ve ever built: loosely coupled components with clear contracts. It’s not exciting architecture. It’s correct architecture.
Best model for each phase
One thing that fell out of the separation: using different models for different triage phases became trivial.
When alerts include screenshots — Sentry error captures, Grafana dashboard snapshots, user-reported visual bugs — image-first analysis changes the game. Gemini Flash handles multimodal triage at low latency and low cost. Across dozens of my own real production incidents, image-first analysis consistently surfaced initial hypotheses faster than text-only approaches. Dashboard screenshots give the model spatial information that structured log text can’t provide.
Then the harness escalates to Claude Sonnet for deeper causal reasoning when the situation is complex. Best model for each phase — not one model forced to handle everything. The three-layer separation makes this a configuration change, not a refactor.
Real triage: from alert to action
Here’s the flow when a known pattern recurs — the best case that demonstrates why incident memory matters:
0s 2s 7s 8s 23s 30s│─────────│──────────│───────────│────────────────│────────────│Alert Keep Nanobot Harness You Root causefires enriches notifies analyzes decide identified +correlates +historical +deploys match- Alert fires (0-2s) — Loki catches elevated error rate across the payment service.
- Keep enriches (2-7s) — Correlates with a deployment 17 minutes prior. Retrieves previous connection pool incidents. Tags similar patterns.
- Nanobot notifies (instant) — Telegram message with correlation ID and enrichment summary.
- Harness analyzes (8-23s) — Checks log patterns against historical matches, assesses related service health, generates a structured verdict.
- You decide (23-30s) — HTML report: “High confidence. Database connection pool exhausted. Previous pattern match. Suggested action: restart pool, increase max_connections from 100 to 200.”
Root cause identified. Historical context delivered. Specific remediation suggested. No tab-switching. No re-diagnosis.
Novel incidents take longer. But even then, the enrichment and correlation steps eliminate the manual context-assembly that eats most of triage time. After years of doing that assembly in my head, having a system do it for me is a meaningful quality-of-life improvement.
Before and after
| Dimension | Before (Manual Triage) | After (Three-Layer Pattern) |
|---|---|---|
| Time to root cause | 15-30 minutes of tab-switching and correlation | ~30 seconds for known patterns; minutes for novel incidents |
| Context correlation | Manual — you’re the integration layer across 6+ tabs | Automatic — Keep unifies Sentry, Grafana, Loki, deploy history |
| Incident memory | Your head. Post-mortems you might write if you have time | Persistent. Searchable. Every incident enriches future triage |
| Mobile capability | Laptop required. 5+ minutes to get situated | Telegram-native. Assess and act from your phone |
| Cognitive load | High — assembling context while under pressure | Low — system assembles context; you make one decision |
| Model flexibility | N/A or single vendor | Model-agnostic via OpenRouter |
| Cost | Enterprise licenses (built for teams, priced for teams) | MIT-licensed. API costs only, on your terms |
| Knowledge preservation | Inconsistent at best when you’re also the only developer | Automatic structured reports from every incident |
Open source, because infrastructure tooling should be auditable
Everything ships MIT.
Audit every line. When your incident response depends on a system, you should see what’s running. I’ve spent enough of my career debugging vendor black boxes.
Fork and customize. Modify reasoning logic, add proprietary providers, tune thresholds for your environment.
Cost control. Run with OpenRouter’s free-tier models during development, pin to paid models for production. Every dollar matters and I control where they go.
Five minutes from clone to running
git clone https://github.com/ameno-/nano-sre.gitcd nano-sre./scripts/bootstrap.shcp .env.example .env# Add OPENROUTER_API_KEY and provider credentials./scripts/up.sh
# Keep UI: http://localhost:38000# Harness: http://localhost:38790/healthz# Nanobot: Configure Telegram webhook per docsSame .env contract local and production. Full setup in the README. Provider integration guide in docs/ADDING_PROVIDERS.md.
Honest limitations
No built-in alerting rules — use your existing Prometheus, Grafana, or whatever you’ve got. Telegram-only for now (Slack on the roadmap). Self-hosted only. Requires Docker proficiency.
If you need monitoring from scratch, this isn’t it. NANO-SRE assumes you already have alerts and focuses on making triage work when you’re the only one responding to them.
Where this pattern goes next
The three-layer separation applies to any workflow requiring time-pressued decisions with incomplete information. That’s most operational work.
Adapting for security incident response means swapping Sentry for Wiz or Falco alerts — the reasoning layer barely changes. For customer support triage, swap Loki for Zendesk ticket streams. For deployment validation, the enrichment layer correlates canary metrics while the reasoning layer generates go/no-go verdicts.
My prediction: In 18 months, every team running production infrastructure will have some version of this separation. The models will improve. The integrations will multiply. But the pattern — loose coupling between interface, memory, and intelligence — will endure. It follows the same principles that make all resilient distributed systems work. I’ve seen enough production architectures to recognize a durable pattern when I find one.
The origin: what OpenClaw taught me
I built NANO-SRE after months contributing to OpenClaw (back when it was still called Clawdbot). That experience crystallized something I’d felt throughout my career: the model isn’t the bottleneck — the architecture around the model is.
OpenClaw demonstrated that AI agents need persistent memory, multi-channel interfaces, and modular tool integration to be useful. NANO-SRE applies those principles to the domain I’ve spent the most time in — production infrastructure — where getting the architecture right isn’t academic. It directly determines how fast you resolve incidents and how sustainable the work is long-term.
Try the pattern
Clone the repo. Run ./scripts/bootstrap.sh. See if the separation makes your triage faster.
Open a GitHub ticket if you hit an issue. docs/PR_REVIEW_GUIDE.md if you want to contribute a provider.
If you’re building AI into operational workflows — incident triage, support, security, deployment validation — I work with engineering teams on these patterns. Years of production incident response plus hands-on AI engineering. If your team is navigating where to put the intelligence, I’ve probably already tried the approach you’re considering and can tell you what happens next.
Repository: github.com/ameno-/nano-sre
License: MIT
Stack: Nanobot + Keep + Pi Agent SDK + OpenRouter
Pattern: The Three-Layer AI Operations Pattern
References
- Alert fatigue solutions for DevOps teams (incident.io, 2025) — 2,000+ weekly alerts, only 3% actionable
- State of Incident Management 2025 (Runframe) — Toil rose to 30% despite AI investment; 73% of orgs had outages from ignored alerts
- AIOps for SRE (DevOps.com) — 70% of SREs report on-call stress impacts burnout
- Alert fatigue reduction with AI agents (IBM) — 4,484 alerts/day average; 67% ignored
- Understanding alert fatigue (Atlassian)
- OpenClaw (Wikipedia)
Key Takeaways
- The bottleneck in operations isn't monitoring — it's context assembly between alerts and decisions
- Three-layer separation (interface + memory + reasoning) mirrors production architecture patterns
- Known incident patterns go from alert to root cause in ~30 seconds vs 15-30 minutes
- Model-agnostic design falls out naturally from loose coupling
- Pattern generalizes to security response, support triage, and deployment validation