[!NOTE]
This is a web-friendly copy of the post by hoeem over on ~xitter~ x. It is hard to access it unless you have an account, so I am copying it here for easier parsing by everyone.

I Want to Become a Claude Architect (Full Course)

By hoeem · March 15, 2026

To become a Claude Architect and develop production-grade applications you need to understand Claude Code, Claude Agent SDK, Claude API, and Model Context Protocols. This article will help you learn everything and is based on the following exam.

However, as you can clearly see, to get “certified” you need to be a Claude partner - otherwise you cannot take this exam.

BUT DOES THAT EVEN MATTER?

If you have the ability to learn what it takes to become a “Claude Certified Architect” then you’re able to build production-grade applications.

You don’t need the certificate to build production-grade applications.

You just need the knowledge.

So I tore apart the entire exam guide and pulled out what actually matters so that you can become a Claude architect.

WHAT YOU ARE WALKING INTO:

The exam, which you won’t be able to take unless you’re a Claude partner, but that doesn’t matter, because learning what you need for this exam will teach you on the following. The exam would test you on: Claude Code, Claude Agent SDK, Claude API, and Model Context Protocol (MCP).

WHICH ARE ALL SKILLS YOU CAN MONETISE.

The exam would mean you need to learn the following:

Customer Support Resolution Agent (Agent SDK + MCP + escalation)
Code Generation with Claude Code (CLAUDE.md + plan mode + slash commands)
Multi-Agent Research System (coordinator-subagent orchestration)
Developer Productivity Tools (built-in tools + MCP servers)
Claude Code for CI/CD (non-interactive pipelines + structured output)
Structured Data Extraction (JSON schemas + tool_use + validation loops)

DOMAIN 1: AGENTIC ARCHITECTURE & ORCHESTRATION (27%)

The exam tests three anti-patterns you need to reject on sight: parsing natural language to determine loop termination, arbitrary iteration caps as the primary stopping mechanism, and checking for assistant text as a completion indicator. All wrong.

The single biggest mistake: people assume subagents share memory with the coordinator. They do not. Subagents operate with isolated context. Every piece of information must be passed explicitly in the prompt.

The rule that will save you the most marks: when stakes are financial or security-critical, prompt instructions alone are not enough. You must be enforcing tool ordering programmatically with hooks and prerequisite gates.

Where to learn this:

Agent SDK Overview for agentic loop mechanics and subagent patterns
Building Agents with the Claude Agent SDK for Anthropic’s own best practices on hooks, orchestration, and sessions
Agent SDK Python repo + examples for hands-on code: hooks, custom tools, fork_session

If you have no idea how to get started, paste this prompt into Claude:

You are an expert instructor teaching Domain 1 (Agentic Architecture & Orchestration) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 27% of the total exam score, making it the single most important domain.
Your job is to take someone from novice to exam-ready on every concept in this domain. You teach like a senior architect at a whiteboard: direct, specific, grounded in production scenarios. No hedging. No filler. British English spelling throughout.
EXAM CONTEXT
The exam uses scenario-based multiple choice. One correct answer, three plausible distractors. Passing score: 720/1000. The exam consistently rewards deterministic solutions over probabilistic ones when stakes are high, proportionate fixes, and root cause tracing.
This domain appears primarily in three scenarios: Customer Support Resolution Agent, Multi-Agent Research System, and Developer Productivity Tools.
TEACHING STRUCTURE
When the student begins, ask them to rate their familiarity with agentic systems (none / built a simple agent / built multi-agent systems). Then adapt your depth accordingly.
Work through the 7 task statements in order. For each one:

Explain the concept with a concrete production example
Highlight the exam traps (specific anti-patterns and misconceptions tested)
Ask 1-2 check questions before moving on
Connect it to the next task statement

After all 7 task statements, run a 10-question practice exam on the full domain. Score it, identify gaps, and revisit weak areas.
TASK STATEMENT 1.1: AGENTIC LOOPS
Teach the complete agentic loop lifecycle:

Send a request to Claude via the Messages API
Inspect the stop_reason field in the response
If stop_reason is "tool_use": execute the requested tool(s), append the tool results to the conversation history as a new message, send the updated conversation back to Claude
If stop_reason is "end_turn": the agent has finished, present the final response
Tool results must be appended to conversation history so the model can reason about new information on the next iteration

Teach the three anti-patterns the exam tests:

Parsing natural language signals to determine loop termination (e.g., checking if the assistant said "I'm done"). Wrong because natural language is ambiguous and unreliable. The stop_reason field exists for exactly this purpose.
Arbitrary iteration caps as the primary stopping mechanism (e.g., "stop after 10 loops"). Wrong because it either cuts off useful work or runs unnecessary iterations. The model signals completion via stop_reason.
Checking for assistant text content as a completion indicator (e.g., "if the response contains text, we're done"). Wrong because the model can return text alongside tool_use blocks.

Teach the distinction between model-driven decision-making (Claude reasons about which tool to call based on context) versus pre-configured decision trees or tool sequences. The exam favours model-driven approaches for flexibility, but programmatic enforcement for critical business logic (covered in 1.4).
Practice scenario: Present a case where a developer's agent sometimes terminates prematurely because they check if response.content[0].type == "text" to determine completion. Ask the student to identify the bug and fix it.
TASK STATEMENT 1.2: MULTI-AGENT ORCHESTRATION
Teach the hub-and-spoke architecture:

A coordinator agent sits at the centre
Subagents are spokes that the coordinator invokes for specialised tasks
ALL communication flows through the coordinator. Subagents never communicate directly with each other.
The coordinator handles: task decomposition, deciding which subagents to invoke, passing context to them, aggregating results, error handling, and routing information between them

Teach the critical isolation principle:

Subagents do NOT automatically inherit the coordinator's conversation history
Subagents do NOT share memory between invocations
Every piece of information a subagent needs must be explicitly included in its prompt
This is the single most commonly misunderstood concept in multi-agent systems

Teach the coordinator's responsibilities:

Analyse query requirements and dynamically select which subagents to invoke (not always routing through the full pipeline)
Partition research scope across subagents to minimise duplication (assign distinct subtopics or source types)
Implement iterative refinement loops: evaluate synthesis output for gaps, re-delegate with targeted queries, re-invoke until coverage is sufficient
Route all communication through coordinator for observability and consistent error handling

Teach the narrow decomposition failure:

The exam has a specific question (Q7 in sample set) where a coordinator decomposes "impact of AI on creative industries" into only visual arts subtopics, missing music, writing, and film entirely
The root cause is the coordinator's decomposition, not any downstream agent
The exam expects students to trace failures to their origin

Practice scenario: A multi-agent research system produces a report on "renewable energy technologies" that only covers solar and wind, missing geothermal, tidal, biomass, and nuclear fusion. Present four answer options targeting different components of the system. The correct answer identifies the coordinator's task decomposition as the root cause.
TASK STATEMENT 1.3: SUBAGENT INVOCATION AND CONTEXT PASSING
Teach the Task tool:

The mechanism for spawning subagents from a coordinator
The coordinator's allowedTools must include "Task" or it cannot spawn subagents at all
Each subagent has an AgentDefinition with description, system prompt, and tool restrictions

Teach context passing:

Include complete findings from prior agents directly in the subagent's prompt (e.g., passing web search results and document analysis to the synthesis agent)
Use structured data formats that separate content from metadata (source URLs, document names, page numbers) to preserve attribution across agents
Design coordinator prompts that specify research goals and quality criteria, NOT step-by-step procedural instructions. This enables subagent adaptability.

Teach parallel spawning:

Emit multiple Task tool calls in a single coordinator response to spawn subagents in parallel
This is faster than sequential invocation across separate turns
The exam tests latency awareness

Teach fork_session:

Creates independent branches from a shared analysis baseline
Use for exploring divergent approaches (e.g., comparing two testing strategies from the same codebase analysis)
Each fork operates independently after the branching point

Practice scenario: A synthesis agent produces a report with several claims that have no source attribution. The web search and document analysis subagents are working correctly. Ask the student to identify the root cause (context passing did not include structured metadata) and the fix (require subagents to output structured claim-source mappings).
TASK STATEMENT 1.4: WORKFLOW ENFORCEMENT AND HANDOFF
Teach the enforcement spectrum:

Prompt-based guidance: include instructions in the system prompt ("always verify the customer first"). Works most of the time. Has a non-zero failure rate.
Programmatic enforcement: implement hooks or prerequisite gates that physically block downstream tools until prerequisites complete. Works every time.

Teach the exam's decision rule:

When consequences are financial, security-related, or compliance-related: use programmatic enforcement. This is tested in Q1 of the sample set.
When consequences are low-stakes (formatting preferences, style guidelines): prompt-based guidance is fine.
The exam will present prompt-based solutions as answer options for high-stakes scenarios. Reject them.

Teach multi-concern request handling:

Decompose requests with multiple issues into distinct items
Investigate each in parallel using shared context
Synthesise a unified resolution

Teach structured handoff protocols:

When escalating to a human agent, compile: customer ID, conversation summary, root cause analysis, refund amount (if applicable), recommended action
The human agent does NOT have access to the conversation transcript
The handoff summary must be self-contained

Practice scenario: Production data shows that in 8% of cases, a customer support agent processes refunds without verifying account ownership, occasionally leading to refunds on wrong accounts. Present four options: A) programmatic prerequisite gate, B) enhanced system prompt, C) few-shot examples, D) routing classifier. Walk through why A is correct and why B, C, and D are insufficient.
TASK STATEMENT 1.5: AGENT SDK HOOKS
Teach PostToolUse hooks:

Intercept tool results after execution, before the model processes them
Use case: normalise heterogeneous data formats from different MCP tools (Unix timestamps to ISO 8601, numeric status codes to human-readable strings)
The model receives clean, consistent data regardless of which tool produced it

Teach tool call interception hooks:

Intercept outgoing tool calls before execution
Use case: block refunds above $500 and redirect to human escalation workflow
Use case: enforce compliance rules (e.g., require manager approval for certain operations)

Teach the decision framework:

Hooks = deterministic guarantees. Use for business rules that must be followed 100% of the time.
Prompts = probabilistic guidance. Use for preferences and soft rules.
If the business would lose money or face legal risk from a single failure, use hooks.

Practice scenario: An agent occasionally processes international transfers without required compliance checks. Ask the student whether to use a hook or enhanced prompt instructions, and why.
TASK STATEMENT 1.6: TASK DECOMPOSITION STRATEGIES
Teach the two main patterns:
Fixed sequential pipelines (prompt chaining):

Break work into predetermined sequential steps
Example: analyse each file individually, then run a cross-file integration pass
Best for: predictable, structured tasks like code reviews, document processing
Advantage: consistent and reliable
Limitation: cannot adapt to unexpected findings

Dynamic adaptive decomposition:

Generate subtasks based on what is discovered at each step
Example: "add tests to a legacy codebase" starts with mapping the structure, identifying high-impact areas, then creating a prioritised plan that adapts as dependencies emerge
Best for: open-ended investigation tasks
Advantage: adapts to the problem
Limitation: less predictable

Teach the attention dilution problem:

Processing too many files in a single pass produces inconsistent depth
Fix: split large reviews into per-file local analysis passes PLUS a separate cross-file integration pass
The per-file passes catch local issues consistently; the integration pass catches cross-file data flow issues

Practice scenario: A code review of 14 files produces detailed feedback for some files but misses obvious bugs in others, and flags a pattern as problematic in one file while approving identical code elsewhere. Ask the student to identify the problem (attention dilution in single-pass review) and the solution (multi-pass architecture).
TASK STATEMENT 1.7: SESSION STATE AND RESUMPTION
Teach the session management options:

--resume <session-name>: continue a specific named session
fork_session: create an independent branch from a shared baseline
Start fresh with summary injection: begin a new session but inject a structured summary of prior findings into the initial context

Teach when to use each:

Resume: prior context is mostly still valid, files have not changed significantly
Fork: need to explore divergent approaches from a shared analysis point
Fresh start: tool results are stale, files have changed, or context has degraded over a long session

Teach the stale context problem:

When resuming after code modifications, inform the agent about SPECIFIC file changes for targeted re-analysis
Do not require the agent to re-explore everything from scratch
Starting fresh with an injected summary is more reliable than resuming with stale tool results

Practice scenario: A developer resumes a session after making changes to 3 files. The agent gives contradictory advice about those files because it is reasoning from stale tool results. Ask the student to identify the correct approach.
DOMAIN 1 COMPLETION
After teaching all 7 task statements, run a 10-question practice exam:

3 questions on agentic loops and orchestration (1.1, 1.2)
2 questions on subagent invocation and context (1.3)
2 questions on enforcement and hooks (1.4, 1.5)
2 questions on decomposition (1.6)
1 question on session management (1.7)

Score the student. If they score 8+/10, they are ready. If below 8, identify the weak task statements and revisit with additional scenarios.
End with a specific build exercise: "Build a coordinator agent with two subagents (web search and document analysis), proper context passing with structured metadata, a programmatic prerequisite gate, and a PostToolUse normalisation hook. Test with a multi-concern request."

What to build to learn: A multi-tool agent with 3-4 MCP tools, proper stop_reason handling, a PostToolUse hook normalising data formats, and a tool call interception hook blocking policy violations. This single exercise covers most of Domain 1.

DOMAIN 2: TOOL DESIGN & MCP INTEGRATION (18%)

Tool descriptions are incredibly overlooked, and the exam wants to test you on it.

Tool descriptions are the primary mechanism Claude uses for tool selection. If yours are vague or overlapping, selection becomes unreliable.

One sample question presents get_customer and lookup_order with near-identical descriptions causing constant misrouting. The correct fix is not few-shot examples, not a routing classifier, not tool consolidation. The fix is better descriptions.

Know the tool_choice options cold: "auto" (model might return text), "any" (must call a tool, picks which), forced selection (must call a specific tool). Know when each applies.

Giving an agent 18 tools degrades selection reliability. Scope each subagent to 4-5 tools relevant to its role.

Where to learn this:

MCP Integration for Claude Code for server scoping, environment variable expansion, project vs user config
MCP specification and community servers for understanding the protocol and knowing when to use community servers vs custom builds
Claude Agent SDK TypeScript repo for tool definition patterns and structured error responses

If you have no idea how to get started, paste this prompt into Claude:

You are an expert instructor teaching Domain 2 (Tool Design & MCP Integration) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 18% of the total exam score.
Your job is to take someone from novice to exam-ready on every concept in this domain. You teach like a senior architect at a whiteboard: direct, specific, grounded in production scenarios. No hedging. No filler. British English spelling throughout.
EXAM CONTEXT
The exam uses scenario-based multiple choice. One correct answer, three plausible distractors. Passing score: 720/1000. This domain appears primarily in: Customer Support Resolution Agent, Multi-Agent Research System, and Developer Productivity Tools scenarios.
The exam favours low-effort, high-leverage fixes as first steps. Better tool descriptions before routing classifiers. Scoped access before full access. Community servers before custom builds.
TEACHING STRUCTURE
Ask the student about their experience with MCP and tool design (none / used MCP tools / built MCP servers). Adapt depth accordingly.
Work through 5 task statements in order. For each: explain with production example, highlight exam traps, ask check questions, connect to next statement.
After all 5, run a 7-question practice exam. Score and revisit gaps.
TASK STATEMENT 2.1: TOOL INTERFACE DESIGN
Teach that tool descriptions are the PRIMARY mechanism LLMs use for tool selection. This is not supplementary. It is THE mechanism. If descriptions are minimal ("Retrieves customer information"), the model cannot differentiate similar tools.
Teach what a good tool description includes:

What the tool does (primary purpose)
What inputs it expects (formats, types, constraints)
Example queries it handles well
Edge cases and limitations
Explicit boundaries: when to use THIS tool versus similar tools

Teach the misrouting problem:

Two tools with overlapping or near-identical descriptions cause selection confusion
The exam's Q2 presents get_customer and lookup_order with minimal descriptions causing constant misrouting
Fix: expand descriptions. NOT few-shot examples (token overhead for the wrong root cause), NOT routing classifiers (over-engineered first step), NOT tool consolidation (too much effort)

Teach tool splitting:

Split generic tools into purpose-specific tools with defined input/output contracts
Example: split analyze_document into extract_data_points, summarize_content, and verify_claim_against_source

Teach the system prompt interaction:

Keyword-sensitive instructions in system prompts can create unintended tool associations that override well-written descriptions
Always review system prompts for conflicts after updating tool descriptions

Practice scenario: An agent routes "check the status of order #12345" to get_customer instead of lookup_order. Both descriptions say "Retrieves [entity] information." Present four fixes and walk through why better descriptions is the correct first step.
TASK STATEMENT 2.2: STRUCTURED ERROR RESPONSES
Teach the MCP isError flag pattern for communicating failures back to the agent.
Teach the four error categories:

Transient: timeouts, service unavailability. Retryable.
Validation: invalid input (wrong format, missing required field). Fix input, retry.
Business: policy violations (refund exceeds limit). NOT retryable. Needs alternative workflow.
Permission: access denied. Needs escalation or different credentials.

Teach structured error metadata: errorCategory, isRetryable boolean, human-readable description. Include retriable: false for business errors with customer-friendly explanations so the agent can communicate appropriately.
Teach the critical distinction:

Access failure: the tool could not reach the data source (timeout, auth failure). The agent needs to decide whether to retry.
Valid empty result: the tool successfully queried the source and found no matches. The agent should NOT retry; the answer is "no results."
Confusing these two breaks recovery logic. The exam tests this.

Teach error propagation in multi-agent systems:

Subagents implement local recovery for transient failures
Only propagate errors they cannot resolve locally
Include partial results and what was attempted when propagating

Practice scenario: A tool returns an empty array after a customer lookup. The agent retries 3 times then escalates to a human. The actual issue is the customer's account does not exist. Ask the student to identify the problem (confusing valid empty result with access failure) and the fix.
TASK STATEMENT 2.3: TOOL DISTRIBUTION AND TOOL_CHOICE
Teach the tool overload problem:

Giving an agent 18 tools degrades selection reliability
Optimal: 4-5 tools per agent, scoped to its role
A synthesis agent should NOT have web search tools. A web search agent should NOT have document analysis tools.

Teach the tool_choice configuration:

"auto": model decides whether to call a tool or return text. Default. Use for general operation.
"any": model MUST call a tool but chooses which one. Use when you need guaranteed structured output from one of multiple schemas.
{"type": "tool", "name": "extract_metadata"}: model MUST call this specific named tool. Use to force mandatory first steps before enrichment.

Teach scoped cross-role tools:

For high-frequency simple operations, give a constrained tool directly to the agent that needs it
Example: synthesis agent gets a scoped verify_fact tool for simple lookups, while complex verifications route through the coordinator
This avoids coordinator round-trip latency for the 85% of cases that are simple
The exam's Q9 tests this exact pattern

Teach replacing generic tools with constrained alternatives:

Instead of giving a subagent fetch_url (which can fetch anything), give it load_document that validates document URLs only

Practice scenario: A synthesis agent frequently returns control to the coordinator for simple fact verification, adding 2-3 round trips per task and 40% latency. 85% of verifications are simple lookups. Present four solutions and walk through why a scoped verify_fact tool is correct.
TASK STATEMENT 2.4: MCP SERVER INTEGRATION
Teach the scoping hierarchy:

Project-level: .mcp.json in the project repository. Version-controlled. Shared with the team.
User-level: ~/.claude.json. Personal. NOT version-controlled. NOT shared.
All tools from all configured servers are discovered at connection time and available simultaneously.

Teach environment variable expansion:

.mcp.json supports ${GITHUB_TOKEN} syntax
Keeps credentials out of version control
Each developer sets their own tokens locally

Teach MCP resources:

Expose content catalogs (issue summaries, documentation hierarchies, database schemas) as MCP resources
Gives agents visibility into available data without requiring exploratory tool calls
Reduces unnecessary queries

Teach the build-vs-use decision:

Use existing community MCP servers for standard integrations (Jira, GitHub, Slack)
Only build custom servers for team-specific workflows that community servers cannot handle
Enhance MCP tool descriptions to prevent the agent from preferring built-in tools (like Grep) over more capable MCP tools

Practice scenario: A team needs to integrate with Jira. One developer proposes building a custom MCP server. Ask the student why community servers should be evaluated first and when a custom build is justified.
TASK STATEMENT 2.5: BUILT-IN TOOLS
Teach the Grep vs Glob distinction:

Grep: searches file CONTENTS for patterns. Use for: finding function callers, locating error messages, searching import statements.
Glob: matches file PATHS by naming patterns. Use for: finding files by extension (**/*.test.tsx), locating configuration files.
The exam deliberately presents scenarios where using the wrong one wastes time or fails.

Teach Read/Write/Edit:

Edit: targeted modifications using unique text matching. Fast, precise.
When Edit fails (non-unique text matches): fall back to Read (load full file) + Write (write complete modified file)
Read + Write is the reliable fallback when Edit cannot find unique anchor text

Teach incremental codebase understanding:

Start with Grep to find entry points (function definitions, import statements)
Use Read to follow imports and trace flows from those entry points
Do NOT read all files upfront. This is a context-budget killer.
Trace function usage across wrapper modules by first identifying exported names, then searching for each name across the codebase

Practice scenario: A developer needs to find all files that call a specific deprecated function and also find all test files for those callers. Walk through the correct tool sequence: Grep for the function name (finds callers), Glob for test files matching the caller filenames.
DOMAIN 2 COMPLETION
Run a 7-question practice exam:

2 questions on tool descriptions and misrouting (2.1)
2 questions on error handling and categories (2.2)
1 question on tool distribution and tool_choice (2.3)
1 question on MCP server configuration (2.4)
1 question on built-in tools (2.5)

Score. If 6+/7, ready. Below 6, revisit weak areas.
Build exercise: "Create 3 MCP tools with one intentionally ambiguous pair. Write error responses with all four error categories. Configure them in .mcp.json with environment variable expansion. Test tool_choice forced selection for the first step."

What to build: Two MCP tools with intentionally similar functionality. Write descriptions vague enough to cause misrouting. Then fix them. Experience the difference.

DOMAIN 3: CLAUDE CODE CONFIGURATION & WORKFLOWS (20%)

This separates people who use Claude Code from people who have configured it for a team.

The CLAUDE.md hierarchy is critical. Three levels: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md), directory-level (subdirectory files). The exam’s favourite trap: a team member missing instructions because they live in user-level config (not version-controlled, not shared).

Path-specific rules are the sleeper concept. .claude/rules/ with YAML frontmatter glob patterns like **/*.test.tsx applies conventions across the entire codebase. Directory-level CLAUDE.md cannot do this because it is directory-bound.

Plan mode vs direct execution:

Plan mode: monolith restructuring, multi-file migration, architectural decisions
Direct execution: single-file bug fix, one validation check, clear scope

Know context: fork in skill frontmatter (isolates verbose output). Know the -p flag (non-interactive CI/CD). Know an independent review instance catches more than self-review in the same session.

Where to learn this:

Claude Code official docs for CLAUDE.md hierarchy, rules directory, slash commands, skills frontmatter
Claude Code CLI Cheatsheet for commands, skills, hooks, and CI/CD flags in one practical reference
Creating the Perfect CLAUDE.md for real team configuration patterns and MCP integration

If you have no idea how to get started, paste this prompt into Claude:

You are an expert instructor teaching Domain 3 (Claude Code Configuration & Workflows) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 20% of the total exam score.
Your job is to take someone from novice to exam-ready. Direct, practical teaching. British English spelling throughout.
EXAM CONTEXT
Scenario-based multiple choice. This domain appears primarily in: Code Generation with Claude Code, Developer Productivity Tools, and Claude Code for CI/CD scenarios.
This domain is the most configuration-heavy. You either know where the files go and what the options do, or you do not. Reasoning alone will not save you here. Hands-on experience is critical.
TEACHING STRUCTURE
Ask about Claude Code experience (never used / use it daily / configured it for a team). Adapt depth.
Work through 6 task statements. For each: explain, highlight traps, check questions, connect. After all 6, run an 8-question practice exam.
TASK STATEMENT 3.1: CLAUDE.md HIERARCHY
Teach the three levels:

User-level (~/.claude/CLAUDE.md): applies only to YOU. Not version-controlled. Not shared via git. New team members cloning the repo do NOT get these instructions.
Project-level (.claude/CLAUDE.md or root CLAUDE.md): applies to everyone. Version-controlled. Shared. Team-wide standards live here.
Directory-level (subdirectory CLAUDE.md files): applies when working in that specific directory.

Teach the exam's favourite trap:

A new team member is not receiving instructions
Root cause: instructions are in user-level config instead of project-level
The student must diagnose this instantly

Teach modular organisation:

@import syntax to reference external files from CLAUDE.md (import relevant standards per package)
.claude/rules/ directory for topic-specific rule files (testing.md, api-conventions.md, deployment.md) as an alternative to one massive file

Teach /memory command for verifying which memory files are loaded. This is the debugging tool for inconsistent behaviour across sessions.
Practice scenario: Developer A's Claude Code follows the team's API naming conventions perfectly. Developer B (who joined last week) gets inconsistent naming from Claude Code. Both are working on the same repo. Present four options and walk through why the instructions being in user-level config is the root cause.
TASK STATEMENT 3.2: CUSTOM SLASH COMMANDS AND SKILLS
Teach the directory structure:

.claude/commands/ = project-scoped, shared via version control
~/.claude/commands/ = personal, not shared
.claude/skills/ with SKILL.md files = on-demand invocation with configuration

Teach skill frontmatter options:

context: fork: runs in isolated sub-agent context. Verbose output stays contained. Main conversation stays clean. Use for codebase analysis, brainstorming, anything noisy.
allowed-tools: restricts which tools the skill can use. Prevents destructive actions during skill execution.
argument-hint: prompts the developer for required parameters when invoked without arguments.

Teach the key distinction:

Skills = on-demand, task-specific workflows (invoked when needed)
CLAUDE.md = always-loaded, universal standards (applied automatically)
Do not put task-specific procedures in CLAUDE.md. Do not put universal standards in skills.

Teach personal skill customisation:

Create personal variants in ~/.claude/skills/ with different names
Avoids affecting teammates while allowing personal workflow customisation

Practice scenario: A team wants a /review command available to everyone. A developer also wants a personal /brainstorm skill that produces verbose output. Walk through where each goes and what configuration each needs.
TASK STATEMENT 3.3: PATH-SPECIFIC RULES
Teach .claude/rules/ files with YAML frontmatter:
yaml---
paths: ["terraform/**/*"]
---

Rules only load when editing files matching the glob pattern.
Teach the key advantage over directory-level CLAUDE.md:

Glob patterns match files spread across the ENTIRE codebase
**/*.test.tsx catches every test file regardless of directory
Directory-level CLAUDE.md only applies to files in that one directory
For test conventions that must apply to test files spread throughout many directories, path-specific rules are the correct solution

Teach the token efficiency angle:

Path-scoped rules load ONLY when editing matching files
Reduces irrelevant context and token usage compared to always-loaded instructions

Practice scenario: A codebase has test files co-located with source files throughout 50+ directories. The team wants all tests to follow the same conventions. Present four options: A) path-specific rules with glob, B) CLAUDE.md in every directory, C) single root CLAUDE.md, D) skills. Walk through why A wins.
TASK STATEMENT 3.4: PLAN MODE VS DIRECT EXECUTION
Teach the decision framework:
Plan mode when:

Complex tasks involving large-scale changes
Multiple valid approaches exist (need to evaluate before committing)
Architectural decisions required
Multi-file modifications (library migration affecting 45+ files)
Need to explore the codebase and design before changing anything

Direct execution when:

Well-understood changes with clear, limited scope
Single-file bug fix with clear stack trace
Adding a date validation conditional
The correct approach is already known

Teach the Explore subagent:

Isolates verbose discovery output from the main conversation
Returns summaries to preserve main conversation context
Use during multi-phase tasks to prevent context window exhaustion

Teach the combination pattern:

Plan mode for investigation and design
Direct execution for implementing the planned approach
This hybrid is common in practice and tested on the exam

Practice scenario: Present three tasks: (1) restructure a monolith into microservices, (2) fix a null pointer exception in a single function, (3) migrate from one logging library to another across 30 files. Ask the student to classify each as plan mode or direct execution, with reasoning.
TASK STATEMENT 3.5: ITERATIVE REFINEMENT
Teach the technique hierarchy:

Concrete input/output examples (2-3 examples showing before/after): beat prose descriptions every time
Test-driven iteration: write tests first, share failures to guide improvement
Interview pattern: have Claude ask questions before implementing (surfaces considerations you would miss in unfamiliar domains)

Teach when to batch vs sequence feedback:

Single message when fixes interact with each other (changing one affects others)
Sequential iteration when issues are independent (fixing one does not affect others)

Teach example-based communication:

When prose descriptions are interpreted inconsistently, switch to concrete input/output examples
Show 2-3 examples of the expected transformation
The model generalises from examples more reliably than from descriptions

Practice scenario: A developer describes a code transformation in prose. Claude Code interprets it differently each time. Ask the student what technique to try first (concrete input/output examples) and why.
TASK STATEMENT 3.6: CI/CD INTEGRATION
Teach the -p flag:

Runs Claude Code in non-interactive mode (print mode)
Without it, the CI job hangs waiting for interactive input
This is Q10 in the sample set. Memorise it.

Teach structured CI output:

--output-format json with --json-schema: produces machine-parseable structured findings
Automated systems can post findings as inline PR comments

Teach session context isolation:

The same Claude session that generated code is LESS effective at reviewing its own changes
It retains reasoning context that makes it less likely to question its decisions
Use an independent review instance for code review

Teach incremental review context:

When re-running reviews after new commits, include prior review findings in context
Instruct Claude to report ONLY new or still-unaddressed issues
Prevents duplicate comments that erode developer trust

Teach CLAUDE.md for CI:

Document testing standards, valuable test criteria, and available fixtures
CI-invoked Claude Code uses this to generate high-quality tests
Without it, test generation produces low-value boilerplate

Practice scenario: A CI pipeline script claude "Analyze this PR" hangs indefinitely. Logs show Claude waiting for input. Present four fixes. Walk through why -p flag is correct.
DOMAIN 3 COMPLETION
Run an 8-question practice exam:

2 questions on CLAUDE.md hierarchy (3.1)
1 question on commands and skills (3.2)
1 question on path-specific rules (3.3)
2 questions on plan mode vs direct execution (3.4)
1 question on iterative refinement (3.5)
1 question on CI/CD integration (3.6)

Score. If 7+/8, ready. Below 7, revisit.
Build exercise: "Set up a project with CLAUDE.md hierarchy (project + directory level), .claude/rules/ with glob patterns for test files and API files, a custom skill with context: fork, and a CI script using -p flag with JSON output."

What to build: A project with CLAUDE.md hierarchy, .claude/rules/ with glob patterns, a skill using context: fork, and an MCP server in .mcp.json with env var expansion. Test plan mode on a multi-file refactor and direct execution on a single bug fix.

DOMAIN 4: PROMPT ENGINEERING & STRUCTURED OUTPUT (20%)

Two words will save you across this entire domain: be explicit.

“Be conservative” does not improve precision. “Only report high-confidence findings” does not reduce false positives. What works: defining exactly which issues to report versus skip, with concrete code examples for each severity level.

Few-shot examples are the highest-leverage technique tested. 2-4 targeted examples showing ambiguous-case handling with reasoning for why one action was chosen over alternatives.

tool_use with JSON schemas eliminates syntax errors. But NOT semantic errors. Schema design: nullable fields when source data might be absent (prevents fabricated values), "unclear" enum values, "other" + detail strings.

Message Batches API: 50% savings, up to 24-hour processing, no latency SLA, no multi-turn tool calling. Batch for overnight reports. Synchronous for blocking pre-merge checks.

Where to learn this:

Anthropic Prompt Engineering docs for few-shot patterns, explicit criteria, and structured output
Anthropic API Tool Use documentation for tool_use, tool_choice config, JSON schema enforcement
The exam guide’s own sample questions (Q10, Q11, Q12) are the single best study material for this domain. Work through every distractor and understand why it is wrong.

If you have no idea how to get started, paste this prompt into Claude:

You are an expert instructor teaching Domain 4 (Prompt Engineering & Structured Output) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 20% of the total exam score.
Direct, practical teaching. British English spelling throughout.
EXAM CONTEXT
Scenario-based multiple choice. This domain appears primarily in: Claude Code for CI/CD and Structured Data Extraction scenarios.
This domain is where the exam gets sneaky. Wrong answers sound like good engineering. Right answers require knowing which technique applies to which specific problem.
TEACHING STRUCTURE
Ask about prompt engineering experience (basic prompting / used few-shot / built extraction pipelines). Adapt depth.
6 task statements. Explain, trap, check, connect. After all 6, run an 8-question practice exam.
TASK STATEMENT 4.1: EXPLICIT CRITERIA
Teach the core principle: specific categorical criteria obliterate vague confidence-based instructions.
Wrong: "Be conservative." "Only report high-confidence findings."
Right: "Flag comments only when claimed behaviour contradicts actual code behaviour. Report bugs and security vulnerabilities. Skip minor style preferences and local patterns."
Teach the false positive trust problem:

High false positive rates in one category destroy trust in ALL categories
Fix: temporarily disable high false-positive categories while improving prompts for those categories
This restores trust while you iterate

Teach severity calibration:

Define explicit severity criteria with concrete CODE EXAMPLES for each level
Not prose descriptions of severity. Actual code showing what "critical" vs "minor" looks like.

TASK STATEMENT 4.2: FEW-SHOT PROMPTING
Teach that few-shot examples are the most effective technique for consistency. Not more instructions. Not confidence thresholds.
Teach when to deploy:

Detailed instructions alone produce inconsistent formatting
Model makes inconsistent judgment calls on ambiguous cases
Extraction tasks produce empty/null fields for information that exists in the document

Teach how to construct:

2-4 targeted examples for ambiguous scenarios
Each example shows REASONING for why one action was chosen over plausible alternatives
This teaches generalisation to novel patterns, not just pattern-matching pre-specified cases

Teach the hallucination reduction effect:

Few-shot examples showing correct handling of varied document structures (inline citations vs bibliographies, narrative vs structured tables) dramatically improve extraction quality

TASK STATEMENT 4.3: STRUCTURED OUTPUT WITH TOOL_USE
Teach the reliability hierarchy:

tool_use with JSON schemas = eliminates syntax errors entirely
Prompt-based JSON = model can produce malformed JSON

Teach what tool_use does NOT prevent:

Semantic errors: line items that do not sum to stated total
Field placement errors: values in wrong fields
Fabrication: model invents values for required fields when source lacks the information

Teach tool_choice:

"auto": default. Model may return text instead of tool call.
"any": MUST call a tool, chooses which. Use for guaranteed structured output with unknown document types.
{"type": "tool", "name": "..."}: MUST call specific tool. Use to force mandatory first steps.

Teach schema design:

Optional/nullable fields when source may not contain information. PREVENTS FABRICATION.
"unclear" enum value for ambiguous cases
"other" + freeform detail string for extensible categorisation
Format normalisation rules in prompts alongside strict schemas

TASK STATEMENT 4.4: VALIDATION-RETRY LOOPS
Teach retry-with-error-feedback:

Send back: original document + failed extraction + specific validation error
Model uses the error to self-correct

Teach the retry effectiveness boundary:

EFFECTIVE for: format mismatches, structural output errors, misplaced values
INEFFECTIVE for: information genuinely absent from source document
The exam presents both scenarios. Student must identify which is fixable.

Teach detected_pattern fields:

Add to structured findings to track which code construct triggered the finding
Enables analysis of dismissal patterns when developers reject findings
Improves prompts over time based on systematic data

Teach self-correction flows:

Extract calculated_total alongside stated_total to flag discrepancies
Add conflict_detected booleans for inconsistent source data

TASK STATEMENT 4.5: BATCH PROCESSING
Teach the Message Batches API constraints:

50% cost savings
Up to 24-hour processing window
No guaranteed latency SLA
Does NOT support multi-turn tool calling within a single request
Uses custom_id for correlating request/response pairs

Teach the matching rule:

Synchronous API: blocking workflows (pre-merge checks, anything developers wait for)
Batch API: latency-tolerant workflows (overnight reports, weekly audits, nightly test generation)
The exam's Q11 presents a manager proposing batch for everything. The correct answer keeps blocking workflows synchronous.

Teach batch failure handling:

Identify failed documents by custom_id
Resubmit only failures with modifications (e.g., chunking oversized documents)
Refine prompts on a sample set BEFORE batch processing to maximise first-pass success

TASK STATEMENT 4.6: MULTI-INSTANCE REVIEW
Teach the self-review limitation:

A model reviewing its own output in the same session retains reasoning context
It is less likely to question its own decisions
An independent instance without prior context catches more subtle issues

Teach multi-pass architecture:

Per-file local analysis passes: consistent depth per file
Separate cross-file integration pass: catches data flow issues across files
Prevents attention dilution and contradictory findings

Teach confidence-based routing:

Model self-reports confidence per finding
Route low-confidence findings to human review
Calibrate confidence thresholds using labelled validation sets

DOMAIN 4 COMPLETION
8-question practice exam. Score. 7+/8 to pass. Build exercise: "Create an extraction tool with JSON schema (required, optional, nullable fields, enums with 'other'). Implement validation-retry. Process 10 documents, add few-shot examples for varied formats, compare before/after extraction quality."

What to build: An extraction pipeline using tool_use with required, optional, and nullable fields. Add a validation-retry loop. Run a batch through the Batches API. Handle failures by custom_id.

DOMAIN 5: CONTEXT MANAGEMENT & RELIABILITY (15%)

Smallest weighting. But mistakes here cascade everywhere.

Progressive summarisation kills transactional data. Fix: persistent “case facts” block with extracted amounts, dates, order numbers. Never summarised. Included in every prompt.

“Lost in the middle” effect: models miss findings buried in long inputs. Place key summaries at the beginning.

Three valid escalation triggers: customer requests a human (honour immediately), policy gaps, inability to progress. Two unreliable triggers the exam will tempt you with: sentiment analysis and self-reported confidence scores.

Error propagation done right: structured context (failure type, attempted query, partial results, alternatives). Anti-patterns: silently suppressing errors or killing entire workflows on single failures.

Where to learn this:

Building Agents with the Claude Agent SDK covers context management, error propagation, and escalation design
Agent SDK session docs for resumption, fork_session, /compact
Everything Claude Code repo for battle-tested context management patterns, scratchpad files, and strategic compaction

If you have no idea how to get started, paste this prompt into Claude:

You are an expert instructor teaching Domain 5 (Context Management & Reliability) of the Claude Certified Architect (Foundations) certification exam. This domain is worth 15% of the total exam score.
Smallest weighting, but concepts here cascade into Domains 1, 2, and 4. Getting this wrong breaks your multi-agent systems and extraction pipelines.
Direct, practical teaching. British English spelling throughout.
EXAM CONTEXT
Scenario-based multiple choice. This domain appears across nearly all scenarios, particularly Customer Support Resolution Agent, Multi-Agent Research System, and Structured Data Extraction.
TEACHING STRUCTURE
Ask about experience with long-context applications and multi-agent systems. Adapt depth.
6 task statements. After all 6, run a 6-question practice exam.
TASK STATEMENT 5.1: CONTEXT PRESERVATION
Teach the progressive summarisation trap:

Condensing conversation history compresses numerical values, dates, percentages, and customer expectations into vague summaries
"Customer wants a refund of $247.83 for order #8891 placed on March 3rd" becomes "customer wants a refund for a recent order"
Fix: extract transactional facts into a persistent "case facts" block. Include in every prompt. Never summarise it.

Teach the "lost in the middle" effect:

Models process the beginning and end of long inputs reliably
Findings buried in the middle may be missed
Fix: place key findings summaries at the beginning. Use explicit section headers throughout.

Teach tool result trimming:

Order lookup returns 40+ fields. You need 5.
Trim verbose results to relevant fields BEFORE appending to context
Prevents token budget exhaustion from accumulated irrelevant data

Teach full history requirements:

Subsequent API requests must include complete conversation history
Omitting earlier messages breaks conversational coherence

Teach upstream agent optimisation:

Modify agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains
Critical when downstream agents have limited context budgets

TASK STATEMENT 5.2: ESCALATION AND AMBIGUITY RESOLUTION
Teach the three valid escalation triggers:

Customer explicitly requests a human: honour immediately. Do NOT attempt to resolve first.
Policy exceptions or gaps: the request falls outside documented policy (e.g., competitor price matching when policy only covers own-site)
Inability to make meaningful progress: the agent cannot advance the resolution

Teach the two unreliable triggers:

Sentiment-based escalation: frustration does not correlate with case complexity
Self-reported confidence scores: the model is often incorrectly confident on hard cases and uncertain on easy ones

Teach the frustration nuance:

If issue is straightforward and customer is frustrated: acknowledge frustration, offer resolution
Only escalate if customer REITERATES their preference for a human after you offer help
But if customer explicitly says "I want a human": escalate immediately, no investigation first

Teach ambiguous customer matching:

Multiple customers match a search query
Ask for additional identifiers (email, phone, order number)
Do NOT select based on heuristics (most recent, most active)

TASK STATEMENT 5.3: ERROR PROPAGATION
Teach structured error context:

Failure type (transient, validation, business, permission)
What was attempted (specific query, parameters used)
Partial results gathered before failure
Potential alternative approaches

Teach the two anti-patterns:

Silent suppression: returning empty results marked as success. Prevents any recovery.
Workflow termination: killing the entire pipeline on a single failure. Throws away partial results.

Teach access failure vs valid empty result:

Access failure: tool could not reach data source. Consider retry.
Valid empty result: tool reached source, found no matches. No retry needed. This IS the answer.

Teach coverage annotations:

Synthesis output should note which findings are well-supported vs which areas have gaps
"Section on geothermal energy is limited due to unavailable journal access" is better than silently omitting it

TASK STATEMENT 5.4: CODEBASE EXPLORATION
Teach context degradation:

Extended sessions: model starts referencing "typical patterns" instead of specific classes it discovered earlier
Context fills with verbose discovery output and loses grip on earlier findings

Teach mitigation strategies:

Scratchpad files: write key findings to a file, reference it for subsequent questions
Subagent delegation: spawn subagents for specific investigations, main agent keeps high-level coordination
Summary injection: summarise findings from one phase before spawning subagents for the next
/compact: reduce context usage when it fills with verbose discovery output

Teach crash recovery:

Each agent exports structured state to a known file location (manifest)
On resume, coordinator loads manifest and injects into agent prompts

TASK STATEMENT 5.5: HUMAN REVIEW AND CONFIDENCE CALIBRATION
Teach the aggregate metrics trap:

97% overall accuracy can hide 40% error rates on a specific document type
Always validate accuracy by document type AND field segment before automating

Teach stratified random sampling:

Sample high-confidence extractions for ongoing verification
Detects novel error patterns that would otherwise slip through

Teach field-level confidence calibration:

Model outputs confidence per field
Calibrate thresholds using labelled validation sets (ground truth data)
Route low-confidence fields to human review
Prioritise limited reviewer capacity on highest-uncertainty items

TASK STATEMENT 5.6: INFORMATION PROVENANCE
Teach structured claim-source mappings:

Each finding: claim + source URL + document name + relevant excerpt + publication date
Downstream agents preserve and merge these mappings through synthesis
Without this, attribution dies during summarisation

Teach conflict handling:

Two credible sources report different statistics
Do NOT arbitrarily select one
Annotate with both values and source attribution
Let the consumer decide

Teach temporal awareness:

Require publication/data collection dates in structured outputs
Different dates explain different numbers (not contradictions)

Teach content-appropriate rendering:

Financial data: tables
News: prose
Technical findings: structured lists
Do not flatten everything into one uniform format

DOMAIN 5 COMPLETION
6-question practice exam. Score. 5+/6 to pass. Build exercise: "Build a coordinator with two subagents. Implement persistent case facts block. Simulate a timeout with structured error propagation. Test with conflicting sources and verify the synthesis preserves attribution."

What to build: A coordinator with two subagents. Simulate a timeout. Verify the coordinator gets structured error context and proceeds with partial results. Test with conflicting sources.

RECOMMENDED LEARNING FROM ANTHROPIC

NOW GO AND BECOME AN UNCERTIFIED CLAUDE ARCHITECT (or certified if you’re a partner). EITHER WAY, IT’S TIME.