Production Patterns 11 min read

The Three-Layer Pattern for AI in Production Operations

February 12, 2026 · Updated: February 12, 2026 · Ameno Osman

[ai][sre][devops][incident-response][architecture][automation]

The Three-Layer Pattern for AI in Production Operations

TL;DR

Building on the side? The three-layer AI ops pattern (interface + memory + reasoning) cut triage time from 15-30 min to ~30 sec. Open source, MIT licensed. Repo: https://github.com/ameno-/nano-sre

I’ve been responding to production incidents for over a decade. Most of that time on engineering teams with on-call rotations, runbooks, and incident commanders. I know what good triage looks like when you have the resources.

Lately I’ve been building something on the side as a learning exercise — a small project with real users and real infrastructure. It’s been a great way to explore some ideas I’ve had about AI-assisted operations.

That constraint forced me to look at incident triage differently. The bottleneck wasn’t my monitoring tools. It wasn’t my experience — I’ve handled hundreds of production incidents across my career. The bottleneck was the gap between receiving an alert and having enough assembled context to make a decision.

Every tool I tried to close that gap failed for the same structural reason. They coupled things that needed to be independent.

That realization became NANO-SRE, an open-source incident triage stack. But the tool is secondary. The pattern underneath it — a three-layer separation for AI-augmented operations — is what I think matters, and it generalizes well beyond my situation.

The triage problem when you’re the entire team

I’ve been on both sides of incident response. On a team with an incident commander, a dedicated communicator, and engineers who can split up to investigate different hypotheses simultaneously — triage is a coordination problem. Without that support, it becomes a cognitive load problem.

Four failure modes that I kept running into:

Alert storm correlation failure. One root cause triggers 50+ downstream alerts. I understand the service topology because I built it, but understanding doesn’t help when I’m wading through a wall of notifications trying to isolate the signal. According to research from incident.io, teams receive over 2,000 alerts weekly, with only 3% needing immediate action.

No incident memory. “Didn’t I see this exact pattern three weeks ago?” That question comes up constantly because incident knowledge lives in my head, not my systems. On teams, we had post-mortems, runbooks, Jira tickets. When you’re also building features, handling support, and shipping updates, the documentation discipline slips. The 2025 Catchpoint SRE Report found overloaded teams see MTTR stretch to roughly 4 hours. Without institutional memory, you’re compounding that problem every time a known pattern recurs.

Context fragmentation. Good triage requires metrics (Grafana), errors (Sentry), logs (Loki), and deployment history. On a team, different people pull different sources. Without that, it’s six browser tabs and zero correlation. You become the integration layer between your own tools.

Decision paralysis under pressure. Rollback or hotfix? Config change or code fix? On a team, you can talk through the trade-offs. Without that support, it’s just you, incomplete data, and time pressure. I have years of experience making these calls, but experience doesn’t eliminate the cognitive cost of making those decisions without anyone to sanity-check you.

Everything I tried first (and why it failed)

I’ve shipped production systems for a long time. I’m not easily impressed by tooling demos. So when I tried to build AI-assisted triage, I started with the obvious approaches — and learned why they’re obvious but wrong:

Approach	What Happened	What It Taught Me
Claude in Slack	Hit context limits after three correlated alerts. Useful for straightforward questions, completely broke down during multi-service triage with interleaved log data.	The conversational interface and the reasoning engine have fundamentally different context requirements. Separate them.
Vector database for logs	Semantic search took 15 seconds per query. Technically interesting, but when production is down and users are affected, 15 seconds feels like an eternity.	Triage requires speed over precision. Fast correlation beats perfect recall every time.
Monolith chatbot	Every new provider integration (Sentry, then Grafana, then Loki) broke something upstream. Classic coupling problem — the kind I’d never tolerate in production code but somehow accepted in tooling.	AI-augmented workflows need the same separation of concerns that makes any well-architected system work.

Each failure refined the architecture. By the third iteration, the pattern was clear: three independent layers, loosely coupled, each optimized for its specific job.

The Three-Layer AI Operations Pattern

This is the architectural insight at the heart of NANO-SRE. It’s not novel in the way that a new algorithm is novel — it’s the application of separation of concerns to a domain where people keep building monoliths instead.

Layer 1: Conversational Interface

Component: Nanobot (Telegram-first)

The interface layer handles notification, routing, and human interaction. It needs to be fast, mobile-native, and zero-friction. This layer should be thin. It doesn’t need to be smart. It needs to get information in front of me quickly and let me act on it.

Why Telegram? Because I’m not always at my desk. When an incident hits, I need to assess the situation from wherever I am. Channel routing keeps incidents organized. Periodic summaries let me catch up asynchronously. Zero training required — which matters because “training” implies a team, and I don’t have one.

In practice, the vast majority of my initial incident responses happen from a phone. If you can’t triage from your phone, the tool fails exactly when it matters most.

Layer 2: Data Aggregation & Memory

Component: Keep (open-source AIOps platform)

The data layer handles alert aggregation, persistent history, and automatic enrichment. Keep unifies alerts from Sentry, Grafana, Loki, and custom webhooks into a single source of truth. It correlates with recent deploys and previous incident patterns. Incident knowledge survives service restarts.

This is the layer I wish I’d had years ago on actual SRE teams. Institutional memory that doesn’t depend on anyone remembering to write a post-mortem. It’s the difference between diagnosing a known issue in seconds and burning thirty minutes rediscovering something you already solved.

Layer 3: Deep Reasoning Engine

Component: Custom triage harness (built on Pi Agent SDK)

The reasoning layer handles the cognitive heavy lifting: cross-service correlation, pattern matching against historical incidents, and generating structured verdicts with severity, category, confidence, and suggested actions. It produces HTML artifacts — portable incident reports that serve as automatic documentation.

Unlike a chatbot constrained by message-length limits, the harness maintains extended context for long triage sessions. It catches patterns I’d miss under pressure — not because I lack the experience to spot them, but because no one can hold six data sources in working memory simultaneously while also deciding what to do about them.

Why the separation matters

Each layer evolves independently. Swap models — Claude to Gemini to a local model — without touching the chat interface. Add alert providers without changing reasoning logic. Replace Telegram with Slack (on the roadmap) without rebuilding enrichment or analysis.

I applied the same principle to this stack that I’d apply to any production system I’ve ever built: loosely coupled components with clear contracts. It’s not exciting architecture. It’s correct architecture.

Best model for each phase

One thing that fell out of the separation: using different models for different triage phases became trivial.

When alerts include screenshots — Sentry error captures, Grafana dashboard snapshots, user-reported visual bugs — image-first analysis changes the game. Gemini Flash handles multimodal triage at low latency and low cost. Across dozens of my own real production incidents, image-first analysis consistently surfaced initial hypotheses faster than text-only approaches. Dashboard screenshots give the model spatial information that structured log text can’t provide.

Then the harness escalates to Claude Sonnet for deeper causal reasoning when the situation is complex. Best model for each phase — not one model forced to handle everything. The three-layer separation makes this a configuration change, not a refactor.

Real triage: from alert to action

Here’s the flow when a known pattern recurs — the best case that demonstrates why incident memory matters:

0s        2s         7s          8s               23s          30s
│─────────│──────────│───────────│────────────────│────────────│
Alert     Keep       Nanobot     Harness          You          Root cause
fires     enriches   notifies    analyzes         decide       identified
          +correlates            +historical
          +deploys               match

Alert fires (0-2s) — Loki catches elevated error rate across the payment service.
Keep enriches (2-7s) — Correlates with a deployment 17 minutes prior. Retrieves previous connection pool incidents. Tags similar patterns.
Nanobot notifies (instant) — Telegram message with correlation ID and enrichment summary.
Harness analyzes (8-23s) — Checks log patterns against historical matches, assesses related service health, generates a structured verdict.
You decide (23-30s) — HTML report: “High confidence. Database connection pool exhausted. Previous pattern match. Suggested action: restart pool, increase max_connections from 100 to 200.”

Root cause identified. Historical context delivered. Specific remediation suggested. No tab-switching. No re-diagnosis.

Novel incidents take longer. But even then, the enrichment and correlation steps eliminate the manual context-assembly that eats most of triage time. After years of doing that assembly in my head, having a system do it for me is a meaningful quality-of-life improvement.

Before and after

Dimension	Before (Manual Triage)	After (Three-Layer Pattern)
Time to root cause	15-30 minutes of tab-switching and correlation	~30 seconds for known patterns; minutes for novel incidents
Context correlation	Manual — you’re the integration layer across 6+ tabs	Automatic — Keep unifies Sentry, Grafana, Loki, deploy history
Incident memory	Your head. Post-mortems you might write if you have time	Persistent. Searchable. Every incident enriches future triage
Mobile capability	Laptop required. 5+ minutes to get situated	Telegram-native. Assess and act from your phone
Cognitive load	High — assembling context while under pressure	Low — system assembles context; you make one decision
Model flexibility	N/A or single vendor	Model-agnostic via OpenRouter
Cost	Enterprise licenses (built for teams, priced for teams)	MIT-licensed. API costs only, on your terms
Knowledge preservation	Inconsistent at best when you’re also the only developer	Automatic structured reports from every incident

Open source, because infrastructure tooling should be auditable

Everything ships MIT.

Audit every line. When your incident response depends on a system, you should see what’s running. I’ve spent enough of my career debugging vendor black boxes.

Fork and customize. Modify reasoning logic, add proprietary providers, tune thresholds for your environment.

Cost control. Run with OpenRouter’s free-tier models during development, pin to paid models for production. Every dollar matters and I control where they go.

Five minutes from clone to running

git clone https://github.com/ameno-/nano-sre.git
cd nano-sre
./scripts/bootstrap.sh
cp .env.example .env
# Add OPENROUTER_API_KEY and provider credentials
./scripts/up.sh

# Keep UI:    http://localhost:38000
# Harness:   http://localhost:38790/healthz
# Nanobot:   Configure Telegram webhook per docs

Same .env contract local and production. Full setup in the README. Provider integration guide in docs/ADDING_PROVIDERS.md.

Honest limitations

No built-in alerting rules — use your existing Prometheus, Grafana, or whatever you’ve got. Telegram-only for now (Slack on the roadmap). Self-hosted only. Requires Docker proficiency.

If you need monitoring from scratch, this isn’t it. NANO-SRE assumes you already have alerts and focuses on making triage work when you’re the only one responding to them.

Where this pattern goes next

The three-layer separation applies to any workflow requiring time-pressued decisions with incomplete information. That’s most operational work.

Adapting for security incident response means swapping Sentry for Wiz or Falco alerts — the reasoning layer barely changes. For customer support triage, swap Loki for Zendesk ticket streams. For deployment validation, the enrichment layer correlates canary metrics while the reasoning layer generates go/no-go verdicts.

My prediction: In 18 months, every team running production infrastructure will have some version of this separation. The models will improve. The integrations will multiply. But the pattern — loose coupling between interface, memory, and intelligence — will endure. It follows the same principles that make all resilient distributed systems work. I’ve seen enough production architectures to recognize a durable pattern when I find one.

The origin: what OpenClaw taught me

I built NANO-SRE after months contributing to OpenClaw (back when it was still called Clawdbot). That experience crystallized something I’d felt throughout my career: the model isn’t the bottleneck — the architecture around the model is.

OpenClaw demonstrated that AI agents need persistent memory, multi-channel interfaces, and modular tool integration to be useful. NANO-SRE applies those principles to the domain I’ve spent the most time in — production infrastructure — where getting the architecture right isn’t academic. It directly determines how fast you resolve incidents and how sustainable the work is long-term.

Try the pattern

Clone the repo. Run ./scripts/bootstrap.sh. See if the separation makes your triage faster.

Open a GitHub ticket if you hit an issue. docs/PR_REVIEW_GUIDE.md if you want to contribute a provider.

If you’re building AI into operational workflows — incident triage, support, security, deployment validation — I work with engineering teams on these patterns. Years of production incident response plus hands-on AI engineering. If your team is navigating where to put the intelligence, I’ve probably already tried the approach you’re considering and can tell you what happens next.

Repository: github.com/ameno-/nano-sre
License: MIT
Stack: Nanobot + Keep + Pi Agent SDK + OpenRouter
Pattern: The Three-Layer AI Operations Pattern

References

Alert fatigue solutions for DevOps teams (incident.io, 2025) — 2,000+ weekly alerts, only 3% actionable
State of Incident Management 2025 (Runframe) — Toil rose to 30% despite AI investment; 73% of orgs had outages from ignored alerts
AIOps for SRE (DevOps.com) — 70% of SREs report on-call stress impacts burnout
Alert fatigue reduction with AI agents (IBM) — 4,484 alerts/day average; 67% ignored
Understanding alert fatigue (Atlassian)
OpenClaw (Wikipedia)

Key Takeaways

The bottleneck in operations isn't monitoring — it's context assembly between alerts and decisions
Three-layer separation (interface + memory + reasoning) mirrors production architecture patterns
Known incident patterns go from alert to root cause in ~30 seconds vs 15-30 minutes
Model-agnostic design falls out naturally from loose coupling
Pattern generalizes to security response, support triage, and deployment validation