Context Engineering Advanced 11 min read

Context Engineering: The Complete Guide for Senior Engineers

December 30, 2025 · Updated: December 30, 2025 · Ameno Osman

[ai][context-engineering][mcp][claude-skills][agent-architecture][optimization]

Context Engineering: The Complete Guide for Senior Engineers

TL;DR

Context is your budget - every token loaded is a tax on productivity. This guide covers four patterns that compound: progressive disclosure (94% token reduction), file-based persistence (80% reduction in sub-agent communication), research-first sub-agent architecture (debuggable delegation), and Skills vs MCP selection (100 tokens vs 15,000 upfront). Combined, these patterns transform context management from a bottleneck into a competitive advantage.

Context is finite. Every token your agent loads before it starts working is a tax on productivity.

Most developers don’t notice this tax until their agent slows down, loses track of earlier conversation, or hits the context ceiling mid-task. By then, they’ve been paying it on every request.

This guide consolidates everything I’ve learned about context engineering into a single, actionable resource. Four patterns that compound. Working code. Hard numbers. Honest failure modes.

Why Context Management is the Real AI Engineering Skill

You have 200K tokens. Sounds like a lot. It isn’t.

Here’s what actually happens before you type a single character:

flowchart TD
    A([Agent starts]) --> B[Load ALL tools<br/>from ALL MCPs]
    B --> |10,000 tokens<br/>per MCP server| C[40,000 tokens consumed]
    C --> |20% of context GONE<br/>before any work starts| D([Agent begins work<br/>with 80% remaining])

    style C fill:#ffcdd2

Four MCP servers at 10,000 tokens each = 40,000 tokens consumed before you type a single character. That’s 20% of your 200K context window gone on overhead.

💡 The 15% Rule

If your upfront context consumption exceeds 15% of your context window, you have an architecture problem. Four MCP servers at 10K tokens each = 20% gone before you type a single character.

This isn’t a prompt engineering problem. It’s an architecture problem. And architecture problems require architectural solutions.

The patterns in this guide address different aspects of context consumption:

Progressive Disclosure - Load tools only when needed (94% reduction)
File-Based Persistence - Use the file system instead of conversation memory (80% reduction in sub-agent communication)
Sub-Agent Architecture - Delegate research, not implementation
Skills vs MCP - Choose the right tool loading strategy (100 vs 15,000 tokens upfront)

Let’s break each one down.

Part 1: Progressive Disclosure Pattern

Progressive disclosure loads tools only when needed. Instead of front-loading everything, give your agent an index of what exists - and let it load only what it uses.

flowchart TD
    A([Agent starts]) --> B[Load PRIME only<br/>index of tools]
    B --> |~500 tokens| C{Agent needs<br/>market data?}
    C --> |Yes| D[Load ONLY<br/>market_search.py]
    D --> |2,000 tokens| E([Agent continues<br/>with 98% context remaining])

    style B fill:#c8e6c9
    style E fill:#c8e6c9

Result: 2,500 tokens instead of 40,000. 94% reduction.

Implementation: Tool Index + UV Scripts

Create a tool index as a simple markdown file:

~/tools/README.md

# Available Tools

You have access to the following tool scripts. **Do not read script contents unless --help doesn't provide enough information.**

## Market Tools
Located in: `~/tools/market/`
- `search.py` - Search prediction markets by keyword
- `get_market.py` - Get details for specific market ID
- `get_orderbook.py` - Get current orderbook for market

## Data Tools
Located in: `~/tools/data/`
- `fetch_csv.py` - Download CSV from URL
- `analyze_csv.py` - Basic statistical analysis
- `transform_csv.py` - Transform/filter CSV data

## When you need a tool:
1. Run `uv run ~/tools/{category}/{script}.py --help`
2. Read the help output to understand usage
3. Run the tool with appropriate arguments
4. Only read script source if help is insufficient

This index costs ~200 tokens. The agent knows WHERE tools are without loading them.

Each tool is a UV single-file script with embedded dependencies:

~/tools/market/search.py

# github: https://github.com/ameno-/acidbath-code/blob/main/agentic-patterns/context-engineering/when-you-need/when_you_need.py
#!/usr/bin/env -S uv run
# /// script
# dependencies = [
#   "requests>=2.31.0",
#   "rich>=13.0.0",
# ]
# ///
"""
Search prediction markets by keyword.

Usage:
    uv run search.py --query "election" [--limit 10]

Arguments:
    --query, -q    Search term (required)
    --limit, -l    Max results (default: 10)
    --format, -f   Output format: json|table (default: table)
"""

import argparse
50 collapsed lines
import json
import requests
from rich.console import Console
from rich.table import Table

KALSHI_API = "https://trading-api.kalshi.com/trade-api/v2"

def search_markets(query: str, limit: int = 10) -> list:
    """Search Kalshi markets by keyword."""
    response = requests.get(
        f"{KALSHI_API}/markets",
        params={"status": "open", "limit": limit},
        headers={"Accept": "application/json"}
    )
    response.raise_for_status()

    markets = response.json().get("markets", [])
    # Filter by query in title
    return [m for m in markets if query.lower() in m.get("title", "").lower()][:limit]

def main():
    parser = argparse.ArgumentParser(description="Search prediction markets")
    parser.add_argument("-q", "--query", required=True, help="Search term")
    parser.add_argument("-l", "--limit", type=int, default=10, help="Max results")
    parser.add_argument("-f", "--format", choices=["json", "table"], default="table")

    args = parser.parse_args()

    markets = search_markets(args.query, args.limit)

    if args.format == "json":
        print(json.dumps(markets, indent=2))
    else:
        console = Console()
        table = Table(title=f"Markets matching '{args.query}'")
        table.add_column("ID", style="cyan")
        table.add_column("Title", style="green")
        table.add_column("Volume", justify="right")

        for m in markets:
            table.add_row(
                m.get("ticker", ""),
                m.get("title", "")[:50],
                str(m.get("volume", 0))
            )

        console.print(table)

if __name__ == "__main__":
    main()

The Flow in Practice

User: "What's the current price on election markets?"

Agent:
1. Reads ~/tools/README.md (200 tokens)
2. Sees market/search.py exists
3. Runs: uv run ~/tools/market/search.py --help (500 tokens)
4. Learns usage from help output
5. Runs: uv run ~/tools/market/search.py -q "election"
6. Returns results

Total context: ~2,700 tokens
MCP equivalent: ~10,000 tokens

Progressive Disclosure Numbers

Approach	Initial Load	Per-Tool Cost	4 Tools Used
MCP Server	10,000 tokens	0 (pre-loaded)	10,000
Progressive	200 tokens	500-2,000	2,200-8,200

Progressive disclosure wins when you use fewer than all available tools - which is almost always. Most tasks use 2-3 tools out of 20 available.

Semantic Search: When Text Search Fails

For large codebases (1,000+ files), progressive disclosure alone isn’t enough. The problem shifts from tool loading to search strategy.

Brute-force text search doesn’t scale. Consider a symbol rename:

flowchart TD
    TASK["Task: Rename UserService -> UserManager"]
    TASK --> S1

    subgraph S1["Step 1: grep -r"]
        G1[847 matches across 312 files]
        G2[Includes comments, strings, similar names]
        G3[15,000 tokens consumed]
    end

    S1 --> S2
    subgraph S2["Step 2: Read files"]
        R1[Agent reads each file]
        R2[Looking for actual type usage]
        R3[More tokens consumed]
    end

    S2 --> S3
    subgraph S3["Step 3: Make changes"]
        C1[Text replacement]
        C2[Misses generic constraints]
        C3[Misses reflection usage]
    end

    S3 --> S4
    subgraph S4["Step 4: Build fails"]
        F1[Agent scans again]
        F2[More tokens]
        F3[Loop continues...]
    end

    S4 --> |Retry loop| S1
    S4 --> RESULT[3 hours, 28M tokens, $14, still broken]

    style RESULT fill:#ffcdd2

The build-fail-retry cycle is the productivity bottleneck.

Semantic search with tools like Serena MCP understands code structure. It knows the difference between a type name, a string literal that happens to contain that text, and a comment mentioning it.

Metric	Text Search	Semantic Search	Improvement
Time	3 hours	5 minutes	36x
Tokens	28M	1M	28x
Cost	$14	$0.60	23x
Human intervention	Constant	None	-
Build failures	4	0	-

Real cost savings: $2.40 per 100 queries with semantic search vs $86.40 with full context loading.

⚠ Semantic Search Limitations

Semantic tools can’t catch runtime-only issues, reflection-heavy code patterns, or cross-service boundaries. They excel at static analysis but won’t save you from integration test failures.

When to Use Semantic Search

Ask these questions:

How many files will be touched?
- < 10 files: Text search is fine
- 10-100 files: Text search with careful prompting
- 100-1,000 files: Consider semantic
- 1,000+ files: Semantic is required
What’s the task complexity?
- String replacement: Text search works
- Symbol renaming: Semantic required
- Cross-file refactoring: Semantic required
- Type hierarchy changes: Semantic required

Part 2: File-Based Context Persistence

Conversation history gets compacted. Files don’t.

This is the key insight that changes how you think about agent delegation and long-running tasks.

Why Conversation History Fails

When you’re working on a complex task that spans multiple agent invocations, conversation history becomes unreliable:

Compaction algorithms summarize and lose detail
Context limits force truncation of earlier messages
Cross-agent communication loses context entirely

The file system solves all three problems. It’s persistent, searchable, and accessible to any agent.

The File System as Context Management

sequenceDiagram
    participant P as Parent Agent
    participant F as File System
    participant S as Sub-Agent (researcher)

    P->>F: Write context.md
    P->>S: Research task
    S->>F: Read context.md
    Note over S: Researches documentation
    Note over S: Creates plan
    S->>F: Write research-report.md
    S-->>P: "Research complete: see research-report.md"

    Note over P: Only ~50 tokens returned

    P->>F: Read research-report.md
    Note over P: Full context now available<br/>• Implements with complete understanding<br/>• Can debug because it knows the plan

    Note over P,S: Result: Implementation succeeds, debugging is possible

Token reduction: 80%

Before: Sub-agent returns full research in conversation (10,000+ tokens) After: Sub-agent returns file path (50 tokens)

Parent reads file on-demand when ready to implement.

Context.md Template

Create .claude/templates/context.md:

# Project Context

## Current State
<!-- What exists now, what's working, what's not -->

## Research Needed
<!-- Specific questions the sub-agent should answer -->
1.
2.
19 collapsed lines
3.

## Constraints
<!-- Hard requirements, tech stack, patterns to follow -->
- Must use:
- Cannot use:
- Style:

## Files to Review
<!-- Specific files relevant to this task -->
-
-

## Output Expected
<!-- What should be in the research report -->
- Implementation plan
- Code examples
- Potential issues
- Recommended approach

Research Reports Pattern

The sub-agent writes findings to a file, not the conversation. This creates:

Persistent knowledge - Research reports accumulate in your project. Future work references past decisions.
Debuggable workflows - Always know what was planned and why. Three months later, git log shows the research report that led to the implementation.
Efficient context - Parent agent loads research only when needed, not carried through entire conversation.

Part 3: Sub-Agent Architecture

Custom agents enable specialization. Specialization enables delegation. But delegation has a critical problem:

sequenceDiagram
    participant P as Parent Agent
    participant S as Sub-Agent (isolated)

    P->>S: "Implement Stripe checkout"
    Note over S: Reads 50 files
    Note over S: Makes decisions
    Note over S: Writes code
    S-->>P: "Task completed"

    Note over P: Parent sees ONLY:<br/>• Task was assigned<br/>• Task is "complete"
    Note over P: Parent does NOT see:<br/>• Which files were read<br/>• What decisions were made<br/>• Why approaches were chosen

    Note over P,S: Result: When something breaks, nobody knows why

The parent agent has limited information about what the sub-agent actually did. When something isn’t 100% correct and you want to fix it - that’s where everything breaks down.

⚠ The Delegation Trap

Sub-agents as implementers fail because the parent can’t see what they did. When something breaks, nobody knows why. Sub-agents should be researchers - they gather context, the parent implements.

The Right Way: Research Delegation

sequenceDiagram
    participant U as User
    participant P as Parent Agent
    participant F as File System
    participant S as Sub-Agent (Researcher)

    U->>P: "Add Stripe checkout"
    P->>F: Write context.md
    Note over F: Project state, constraints, questions

    P->>S: "Research Stripe checkout. Context: ./context.md"
    S->>F: Read context.md
    S->>S: Research codebase (Glob, Grep)
    S->>S: Fetch Stripe docs (WebFetch)
    S->>S: Create implementation plan
    S->>F: Write research-report.md

    S-->>P: "Research complete. See research-report.md"
    Note over P: Only ~50 tokens returned

    P->>F: Read research-report.md
    Note over P: Full plan now in parent's context

    P->>P: Implement based on plan
    P->>F: Write/Edit source files
    P->>U: "Implementation complete"

Research Agent Definition

Create .claude/agents/researcher.md:

---
name: researcher
description: Research sub-agent that gathers information and creates implementation plans
tools: Read, Glob, Grep, WebFetch, Write
model: haiku
---

# Research Agent

You are a research sub-agent. Your job is to gather information and create detailed implementation plans. **You do NOT implement anything.**

38 collapsed lines
## Workflow

1. **Read the context file**
   - Always start by reading the context.md file passed to you
   - Understand what's being asked and the constraints

2. **Research the codebase**
   - Find relevant existing code using Grep and Glob
   - Understand current patterns and conventions
   - Identify dependencies and interfaces

3. **Research external documentation**
   - If the task involves external services, fetch their docs
   - Find best practices and examples
   - Note any recent API changes

4. **Create implementation plan**
   - Step-by-step instructions for implementation
   - Include actual code snippets where helpful
   - Note potential issues and how to handle them
   - List files that will need to be modified

5. **Write research report**
   - Save to the location specified in context.md
   - Use clear sections matching the output expected
   - Include confidence levels for recommendations

6. **Return summary only**
   - Tell the parent agent: "Research complete. Report saved to [path]"
   - Do NOT include the full report in your response
   - Keep the summary under 100 words

## Rules

- NEVER implement code, only plan it
- NEVER call other sub-agents
- ALWAYS write findings to file, not conversation
- ALWAYS read context.md first

Specialized Researcher Agents

Build specialized research agents for services you use frequently:

Stripe Research Agent (.claude/agents/stripe-researcher.md):

---
name: stripe-researcher
description: Research Stripe integration patterns and best practices
tools: Read, Glob, Grep, WebFetch, Write
model: haiku
---

# Stripe Research Agent

You research Stripe integrations. You have access to Context7 MCP for up-to-date Stripe documentation.

32 collapsed lines
## Knowledge Base
- Stripe API docs: https://stripe.com/docs/api
- Webhooks guide: https://stripe.com/docs/webhooks
- Best practices: https://stripe.com/docs/best-practices

## Research Areas
- Payment intents vs charges (use payment intents)
- Webhook event handling
- Error handling patterns
- Testing with test mode keys
- PCI compliance considerations

## Output Format
```yaml
recommendation:
  approach: "description"
  confidence: high|medium|low
  stripe_api_version: "2024-xx-xx"

implementation_steps:
  - step: 1
    action: "what to do"
    code: |
      // example code

potential_issues:
  - issue: "description"
    mitigation: "how to handle"

files_to_modify:
  - path: "file path"
    changes: "what changes"

### Model Selection for Sub-Agents

| Task Type | Model | Cost/M | Speed |
|-----------|-------|--------|-------|
| Simple routing | Haiku | $0.25 | Fast |
| Text extraction | Haiku | $0.25 | Fast |
| Research (simple) | Haiku | $0.25 | Fast |
| Code review | Sonnet | $3 | Medium |
| Implementation | Sonnet | $3 | Medium |
| Research (complex) | Sonnet | $3 | Medium |
| Complex reasoning | Opus | $15 | Slow |
| Architecture decisions | Opus | $15 | Slow |

**Rule:** Use the cheapest model that solves the problem. Most research tasks are Haiku tasks. Don't over-engineer.

### Rules That Prevent Disasters

Add these to your research agent definitions:

```markdown collapse={8-22}
## Mandatory Rules

1. **Always read context file first**
   - Never start work without understanding the context
   - If context file doesn't exist, stop and report error

2. **Never implement, only research**
   - Your job is to create the plan
   - The parent agent implements

3. **Never spawn sub-agents**
   - One level of delegation maximum
   - Prevents recursive loops and cost explosions

4. **Always write findings to file**
   - Summary in conversation: < 100 words
   - Full report in file: as detailed as needed

5. **Update context file when done**
   - Add "Last researched: [timestamp]"
   - Note any assumptions made

One level of delegation. Researcher agents never spawn their own sub-agents. This prevents recursive delegation loops, cost explosions, context fragmentation, and debugging nightmares.

Part 4: Skills vs MCP Decision Framework

Skills use 100 tokens of metadata. Then load instructions only when triggered.

That’s a 99% reduction in upfront context cost compared to MCP servers. If you’ve been burning 10K-15K tokens on tool loading before your first message even lands, this is the architectural fix you didn’t know existed.

💡 The Token Math

100 tokens vs 15,000 tokens. Skills load 150x less upfront. That’s not an optimization - it’s a different architecture.

How Skills Architecture Works

Skills use a 3-tier progressive loading system:

Tier 1: Metadata (Loaded at Startup)

---
name: pdf-processing
description: Extracts text and tables from PDF files, fills forms, and merges documents. Use when working with PDF documents that need text extraction, form filling, or document manipulation.
---

This costs approximately 100 tokens. Claude loads this metadata for all available Skills at startup. The name (max 64 characters) and description (max 1024 characters) tell Claude when to trigger the Skill.

Tier 2: Full Instructions (Loaded on Trigger)

When Claude decides the Skill is relevant, it loads the full SKILL.md file. This contains detailed workflow instructions, code examples, validation steps, and error handling patterns. Recommended maximum: 500 lines. Typically costs under 5K tokens.

Tier 3: Reference Files (Loaded on Demand)

Skills can reference additional files:

pdf-processing/
├── SKILL.md                    # Main instructions
├── scripts/
│   ├── extract_text.py        # Utility script
│   ├── fill_form.py           # Utility script
│   └── merge_pdfs.py          # Utility script
└── reference/
    ├── pdf_libraries.md       # Library comparison
    └── common_patterns.md     # Usage patterns

Claude reads these files only when explicitly needed. Zero token cost until accessed.

Token Economics: Real Numbers

MCP Token Cost (Upfront)

Server registration: ~2K tokens
Tool schemas: ~8K-13K tokens per server
Total before first message: 10K-15K tokens

Skills Token Cost (Progressive)

Metadata at startup: ~100 tokens (64 char name + 1024 char description)
Full instructions when triggered: less than 5K tokens
Reference files: 0 tokens until read

The difference? Skills load on-demand. MCP loads everything upfront.

When to Use What

Use MCP When:

External Services: Database connections, API integrations, cloud services
Real-time Data: Stock prices, weather, live metrics
Bidirectional Communication: Writing to databases, posting to APIs
Complex State Management: Multi-step transactions, session management
System Operations: Docker, git operations requiring persistent state

Use Skills When:

Document Generation: Excel, PowerPoint, PDF creation
Deterministic Workflows: Code formatting, file processing, data transformation
Reusable Expertise: Design patterns, coding standards, analysis frameworks
Template-based Tasks: Report generation, document formatting
Offline Operations: Everything can run in code execution environment

ℹ Skills vs MCP Decision

Use Skills for: Document generation, deterministic workflows, template-based tasks, offline operations. Use MCP for: External APIs, real-time data, bidirectional communication, complex state.

Skills Limitations (Learned the Hard Way)

Network Access: Skills run in code execution environment. No direct network access. Use MCP for network operations, Skills for processing the data after retrieval.

Version Pinning: Using "version": "latest" means Anthropic can update the Skill between runs. Pin to specific versions in production: "version": "1.2.0".

File Lifetime: Generated files expire quickly on Anthropic’s servers. Download files immediately. Don’t try to retrieve file_id later.

Part 5: Putting It Together

These patterns compound. Here’s a full workflow combining all four:

Complete Example: Adding Stripe Checkout

Step 1: Parent agent receives task

User: “Add Stripe checkout to the e-commerce flow”

Step 2: Create context file (File-Based Persistence)

# Project Context: Stripe Checkout

## Current State
- E-commerce app using Next.js 14
- Cart functionality complete in src/components/Cart.tsx
- No payment processing currently

## Research Needed
1. Which Stripe API version should we use?
2. Payment Intents vs Charges API for one-time purchases?
3. What webhook events for payment confirmation?
4. Test mode setup requirements?

## Constraints
- Must use: TypeScript, Next.js API routes
- Cannot use: Deprecated Charges API
- Style: Match existing error handling in src/lib/errors.ts

## Files to Review
- src/components/Cart.tsx
- src/lib/errors.ts
- package.json (for existing deps)

## Output Expected
- Implementation plan with code snippets
- List of files to create/modify
- Webhook handling approach
- Testing strategy

Step 3: Spawn research sub-agent (Sub-Agent Architecture)

Task: "Research Stripe checkout. Read context from ./tmp/context-stripe.md.
      Write report to ./tmp/research-stripe.md"
Agent: stripe-researcher
Model: haiku (fast, cheap)

Step 4: Sub-agent researches and writes report

Sub-agent uses progressive disclosure - only loads the tools it needs from the tool index. Writes findings to file, returns only: “Research complete. Report saved to ./tmp/research-stripe.md”

Step 5: Parent reads research and implements

Parent agent reads the research report, now has full context, implements based on the plan.

Step 6: Document generation (Skills)

If the task requires generating documentation or reports, trigger Skills instead of loading MCP servers:

container = {
    "skills": [{
        "type": "anthropic",
        "skill_id": "xlsx"
    }]
}

100 tokens loaded for Skill metadata vs 10K+ for MCP server.

Cost Analysis Across Patterns

Pattern	Token Reduction	Use Case
Progressive Disclosure	94% (40K to 2.5K)	Tool loading
File-Based Persistence	80%	Sub-agent communication
Research Delegation	Debuggability	Complex tasks
Skills vs MCP	99% upfront (100 vs 15K)	Document generation

Combined savings on a complex task:

Without patterns: ~65,000 tokens (4 MCPs + sub-agent context + repeated tool loading)
With patterns: ~8,000 tokens (progressive loading + file-based transfer + Skills)
Reduction: ~87%

Decision Tree: When to Use Which Pattern

flowchart LR
    subgraph MCP["Standard MCP (80%)"]
        M1[External APIs]
        M2[Vendor tools]
        M3[Quick prototyping]
        M4["Small codebases (< 100 files)"]
    end

    subgraph PD["Progressive Disclosure (15%)"]
        P1[Custom internal tools]
        P2[Team-wide utilities]
        P3[Long-running sessions]
        P4[Precise token control]
    end

    subgraph SS["Skills + File-Based (5%)"]
        S1[Document generation]
        S2[Complex delegation]
        S3[Multi-agent workflows]
        S4[Persistent knowledge]
    end

    subgraph COMBO["Combine All Patterns"]
        C1[Load semantic MCP only when refactoring]
        C2[Keep tool index lightweight]
        C3[Skills for docs, MCP for APIs]
        C4[File system for delegation]
    end

    style MCP fill:#e3f2fd
    style PD fill:#fff3e0
    style SS fill:#fce4ec
    style COMBO fill:#e8f5e9

When This Fails: Honest Limitations

Progressive Disclosure Failures

Setup overhead becomes the bottleneck:

If you’re prototyping and need 15 different tools quickly, progressive disclosure adds friction
Writing UV scripts and maintaining a tool index takes time
For one-off tasks, the setup cost exceeds the savings

When tool reuse is low:

If every task needs a new custom script, you’re not saving context
The index becomes noise if most tools are one-time use

Sub-Agent Failures

Context isolation still exists: The researcher agent doesn’t have access to the parent’s conversation history. If critical information only exists in the parent’s memory, the researcher will miss it.

Mitigation: Be explicit in context.md. Don’t assume the researcher “knows” anything.

Research can be wrong: Haiku is fast and cheap but makes mistakes on complex analysis. The researcher might miss edge cases or misunderstand requirements.

Mitigation: Review the research report before implementing. Don’t blindly trust it.

Overhead on simple tasks: Creating context files, spawning agents, reading reports - this adds 30-60 seconds of overhead.

When to skip delegation:

Task takes < 2 minutes to implement directly
You already know exactly what to do
No external research needed
Single file change with obvious solution

Skills Failures

Network access limitations: Skills run in code execution environment. No direct network access.

File lifetime limitations: Generated files expire quickly. Download immediately.

Complex document limitations: Document generation Skills work best with 2-3 sheets/slides. Beyond that, reliability drops.

Try It Now

Week 1 Actions

Day 1-2: Measure Current Context Consumption

Run /context in Claude Code to see current token usage
Count your MCP servers and multiply by 10K for estimated upfront cost
Calculate what percentage of 200K context window this represents
If > 15%, you have an architecture problem

Day 3-4: Implement Progressive Disclosure

Create ~/tools/README.md with an index of 3 UV scripts
Point your agent at it instead of loading an MCP server
Track token consumption with /context
You should see 90%+ reduction in upfront token usage

Day 5-7: Test File-Based Delegation

Create the context template (.claude/templates/context.md)
Create the researcher agent (.claude/agents/researcher.md)
Next time you need to research a new API integration, write a context file and delegate
Read the research report
Implement based on the plan

Measurement Baseline

Track these metrics before and after:

Metric	Before	After	Target
Upfront tokens	?	?	< 15% of context
Per-task tokens	?	?	-50%
Sub-agent context returned	?	?	< 100 tokens
Debugging time	?	?	-30%

Success Criteria

Upfront context consumption < 15% of window
Progressive disclosure active for custom tools
Sub-agents returning file paths, not full content
Skills used for document generation instead of MCP

The best context engineering is invisible. Your agent just works faster, costs less, and fails less often. That’s not an optimization. That’s a competitive advantage.

Ameno Osman

Staff Engineer

I've spent over a decade leading teams that build systems serving millions of users. These days, I'm obsessed with context engineering: the discipline of managing what goes into AI models, not just what comes out. ACIDBATH is where I document what works (and what wastes money) when you're building AI systems for real engineering work, not demos.

13+ years leading engineering teams at scale
Built production systems serving millions of users
Specialized in AI agent architecture and context engineering
Focused on the 90% of AI projects that fail between demo and production

This guide consolidates content from three original posts: Context Engineering: From Token Optimization to Large Codebase Mastery, Agent Architecture: From Custom Agents to Effective Delegation, and Claude Skills Deep Dive: Progressive Loading and the MCP Alternative. The patterns have been unified and expanded with additional integration guidance.

Key Takeaways

The 15% Rule: if upfront context consumption exceeds 15% of your context window, you have an architecture problem
Progressive disclosure reduces context from 40,000 to 2,500 tokens (94% reduction) by loading tools only when needed
File-based context transfer reduces token usage by 80% compared to in-memory context passing between agents
Sub-agents should be researchers, not implementers - they gather context, the parent agent uses it
Skills load 100 tokens upfront vs MCP's 10-15K tokens (99% reduction in upfront cost)
Real cost savings: $2.40 per 100 queries with semantic search vs $86.40 with full context loading
Semantic search provides 36x faster performance on large codebases compared to text search

TL;DR

Why Context Management is the Real AI Engineering Skill

Part 1: Progressive Disclosure Pattern

Implementation: Tool Index + UV Scripts

The Flow in Practice

Progressive Disclosure Numbers

Semantic Search: When Text Search Fails

When to Use Semantic Search

Part 2: File-Based Context Persistence

Why Conversation History Fails

The File System as Context Management

Context.md Template

Research Reports Pattern

Part 3: Sub-Agent Architecture

The Right Way: Research Delegation

Research Agent Definition

Specialized Researcher Agents

Part 4: Skills vs MCP Decision Framework

How Skills Architecture Works

Token Economics: Real Numbers

When to Use What

Skills Limitations (Learned the Hard Way)

Part 5: Putting It Together

Complete Example: Adding Stripe Checkout

Cost Analysis Across Patterns

Decision Tree: When to Use Which Pattern

When This Fails: Honest Limitations

Progressive Disclosure Failures

Sub-Agent Failures

Skills Failures

Try It Now

Week 1 Actions

Measurement Baseline

Success Criteria

Ameno Osman

Get Notified of New Posts

Key Takeaways