Multi-Agent Design Patterns
Natural-language orchestration
Tirea uses natural-language orchestration, inspired by Claude Code’s sub-agent model. The LLM decides when to delegate, to whom, and how to combine results. You define each agent’s identity and access policy; the runtime handles everything else. There are no DAGs, no state machines, and no explicit routing code — unlike frameworks such as LangGraph or Google ADK where you wire agents into graphs and define transitions in code.
This works because the runtime provides:
- Agent registry — agents registered at build time are rendered into the system prompt, so the LLM always knows who it can delegate to
- Background execution with completion notifications — sub-agents run in the background; the runtime injects their status after each tool call, keeping the LLM aware of what’s running, finished, or failed
- Foreground and background modes — block until a sub-agent finishes, or run multiple concurrently and receive completion notifications
- Thread isolation — each sub-agent runs in its own thread with independent state
- Orphan recovery — orphaned sub-agents are detected and resumed on restart
Patterns
All patterns below are implemented with the same building blocks:
agent_run/agent_stop/agent_outputdelegation toolsAgentDefinitionwithallowed_agents/excluded_agentsSuspendTicketfor human-in-the-loop gating- System prompt engineering for control flow
No dedicated workflow agent types are needed. The LLM-driven loop plus delegation tools cover these patterns through prompt configuration.
Pattern Index
| Pattern | Tirea Mechanism | Deterministic Flow? |
|---|---|---|
| Coordinator | agent_run + prompt routing | No (LLM decides) |
| Sequential Pipeline | Chained agent_run calls | No (LLM relays) |
| Parallel Fan-Out/Gather | Parallel agent_run + agent_output | No (best-effort) |
| Hierarchical Decomposition | Nested agent_run | No |
| Generator-Critic | Generator as main agent, critic as child via agent_run | No |
| Iterative Refinement | Same as Generator-Critic with additional refiner child | No |
| Human-in-the-Loop | SuspendTicket + PermissionPlugin | Yes (runtime gating) |
| Swarm / Peer Handoff | TODO — not yet supported | — |
Coordinator
A single orchestrator agent analyzes user intent and routes to specialized worker agents.
User -> [Orchestrator] -> intent analysis
-> agent_run("billing") if billing question
-> agent_run("support") if technical issue
-> agent_run("sales") if pricing question
Setup
let os = AgentOs::builder()
.with_agent_spec(AgentDefinitionSpec::local_with_id("billing", AgentDefinition::new("deepseek-chat")
.with_system_prompt("You are a billing specialist.")
.with_excluded_tools(vec!["agent_run".to_string(), "agent_stop".to_string()])))
.with_agent_spec(AgentDefinitionSpec::local_with_id("support", AgentDefinition::new("deepseek-chat")
.with_system_prompt("You are a technical support specialist.")
.with_excluded_tools(vec!["agent_run".to_string(), "agent_stop".to_string()])))
.with_agent_spec(AgentDefinitionSpec::local_with_id("orchestrator", AgentDefinition::new("deepseek-chat")
.with_system_prompt("Route user requests to the appropriate specialist:
- billing: payment, invoice, subscription issues
- support: technical problems, bugs, errors
Use agent_run to delegate. Use agent_output to read results.")
.with_allowed_agents(vec!["billing".to_string(), "support".to_string()])))
.build()?;
Key decisions
- Worker agents should exclude delegation tools to prevent recursive delegation.
- The orchestrator prompt should list available agents and their responsibilities.
- Use foreground mode (
background=false) when the user expects a direct answer.
Sequential Pipeline
Multiple agents execute in order, each transforming the output of the previous stage.
[Parser Agent] -> raw text
-> [Extractor Agent] -> structured data
-> [Summarizer Agent] -> final report
Setup
The orchestrator prompt drives the sequence:
.with_agent_spec(AgentDefinitionSpec::local_with_id("orchestrator", AgentDefinition::new("deepseek-chat")
.with_system_prompt("Process the document through three stages in order:
1. Call agent_run(\"parser\") with the raw input. Read output with agent_output.
2. Call agent_run(\"extractor\") with the parsed text. Read output with agent_output.
3. Call agent_run(\"summarizer\") with the extracted data. Read output with agent_output.
Return the final summary to the user.")
.with_allowed_agents(vec!["parser".to_string(), "extractor".to_string(), "summarizer".to_string()])))
Limitations
The orchestrator acts as a deterministic relay: call stage N, read output, pass to stage N+1. This relay logic does not require LLM reasoning, yet each relay step consumes an LLM turn. For long pipelines the token and latency overhead adds up. A future runtime-level sequential pipeline primitive could eliminate the relay agent entirely.
When to avoid
If one agent with sequential tool calls can handle the full pipeline, do not split into multiple agents. Multi-agent adds latency and token cost. Split only when stages need different system prompts, tool sets, or model capabilities.
Parallel Fan-Out/Gather
Independent tasks run concurrently, then a synthesizer combines results.
┌-> [Security Auditor] -> security_report
[Orchestrator] ---┼-> [Style Checker] -> style_report
└-> [Performance Analyst] -> perf_report
|
[Synthesizer] -> unified review
Setup
The orchestrator prompt instructs the LLM to issue multiple agent_run tool calls in a single step. When the agent’s tool execution mode supports parallel execution, the runtime runs them concurrently and returns all results before the next LLM turn.
.with_agent_spec(AgentDefinitionSpec::local_with_id("orchestrator", AgentDefinition::new("deepseek-chat")
.with_system_prompt("Review the submitted code:
1. Launch all three reviewers in parallel by calling agent_run for each in the same response:
- agent_run(\"security_auditor\")
- agent_run(\"style_checker\")
- agent_run(\"perf_analyst\")
2. Read the results from each tool call response.
3. Call agent_run(\"synthesizer\") with all three reports.
4. Return the unified review.
IMPORTANT: call all three agent_run tools at the same time, not one after another.")
.with_allowed_agents(vec![
"security_auditor".to_string(), "style_checker".to_string(), "perf_analyst".to_string(), "synthesizer".to_string()
])
.with_tool_execution_mode(ToolExecutionMode::ParallelBatchApproval)))
The key is prompting the LLM to emit multiple tool calls in one response. Most capable models respect this instruction and produce parallel agent_run calls that the runtime executes concurrently.
Limitations
Parallelism is best-effort with no runtime guarantee:
- The approach depends on the LLM choosing to emit multiple tool calls in a single response. If the model serializes them across turns, execution falls back to sequential.
- The runtime cannot force the LLM to produce parallel calls — it can only execute them concurrently when the LLM does.
- A future runtime-level fan-out primitive could guarantee parallelism independent of LLM behavior.
Key decisions
- Set
tool_execution_modetoParallelBatchApprovalorParallelStreamingto enable concurrent tool execution. - Prompt must explicitly instruct the LLM to call all agents at the same time.
- If reviewers depend on each other, use Sequential Pipeline instead.
- Keep each reviewer’s tool set and prompt isolated so they can run independently.
Hierarchical Decomposition
A parent agent breaks complex tasks into subtasks and delegates recursively.
[Report Writer]
|-> agent_run("researcher")
| |-> agent_run("web_search")
| |-> agent_run("summarizer")
|-> agent_run("formatter")
Setup
Middle-layer agents also have delegation tools:
.with_agent_spec(AgentDefinitionSpec::local_with_id("researcher", AgentDefinition::new("deepseek-chat")
.with_system_prompt("Research the given topic. Use web_search for facts, summarizer for condensing.")
.with_allowed_agents(vec!["web_search".to_string(), "summarizer".to_string()])))
.with_agent_spec(AgentDefinitionSpec::local_with_id("report_writer", AgentDefinition::new("deepseek-chat")
.with_system_prompt("Write a report. Delegate research to the researcher agent, formatting to the formatter agent.")
.with_allowed_agents(vec!["researcher".to_string(), "formatter".to_string()])))
Key decisions
- Each layer only sees its allowed children, not the full agent tree.
- Leaf agents should exclude delegation tools entirely.
- Depth is unlimited but each level adds latency.
Generator-Critic
The generator is the main agent. The critic is a child agent called via agent_run. The generator writes output, invokes the critic for review, and revises based on feedback — all within its own loop with full message history.
[Generator] -> write draft
-> agent_run("critic") with draft -> PASS or FAIL + feedback
-> if FAIL: revise draft (history preserved)
-> agent_run("critic") again
-> repeat until PASS
This is better than an orchestrator-mediated approach because the generator retains its full conversation history across revision cycles. It sees all previous drafts and critic feedback naturally, without a middle agent relaying messages.
Setup
.with_agent_spec(AgentDefinitionSpec::local_with_id("critic", AgentDefinition::new("deepseek-chat")
.with_system_prompt("Validate the SQL query. Output exactly PASS if correct. \
Otherwise output FAIL followed by specific errors.")
.with_excluded_tools(vec!["agent_run".to_string(), "agent_stop".to_string()])))
.with_agent_spec(AgentDefinitionSpec::local_with_id("generator", AgentDefinition::new("deepseek-chat")
.with_system_prompt("You are a SQL query writer with a built-in review process:
1. Write a SQL query for the user's requirement.
2. Call agent_run for agent_id=critic with your query, background=false.
3. If critic returns FAIL, revise based on the feedback and call critic again.
4. Repeat until critic returns PASS or 5 attempts are reached.
5. Return the final validated SQL.
Always call the critic before finishing.")
.with_allowed_agents(vec!["critic".to_string()])
.with_max_rounds(20)))
Key decisions
- The generator is the entry point (
agent_idinRunRequest), not a separate orchestrator. - The critic should exclude delegation tools to prevent recursion.
- Set
max_roundshigh enough to allow multiple generate-critique cycles. - The critic must produce a clear pass/fail signal the generator can parse.
- Works best when validation criteria are objective (syntax, schema, format).
Iterative Refinement
Extends Generator-Critic with a dedicated refiner child agent. The generator writes the initial draft, the critic reviews, and the refiner applies improvements based on feedback.
[Generator] -> draft
-> agent_run("critic") -> feedback
-> agent_run("refiner") with draft + feedback -> improved draft
-> agent_run("critic") again -> feedback or PASS
-> repeat (max N iterations)
The setup follows the same pattern as Generator-Critic: the generator is the main agent with allowed_agents: ["critic", "refiner"]. Both critic and refiner are leaf agents without delegation tools.
Human-in-the-Loop
High-risk operations pause for human approval before execution.
[Agent] -> prepare transfer of $50,000
-> tool suspends with SuspendTicket
-> run transitions to Waiting
-> human approves/denies
-> run resumes or cancels
This pattern uses Tirea’s built-in suspension model, not prompt engineering.
Setup
See Enable Tool Permission HITL for the full configuration. The key components:
PermissionPluginintercepts tool calls and emitsSuspendactionsToolCallDecisionchannel carries human decisions back into the loop- Two inbound paths: decision forwarding (same run) and continuation (new run)
Key decisions
- HITL is orthogonal to other patterns. Combine it with any pattern above.
- Use
parallel_streamingtool execution mode for low-latency approval UX. - See HITL and Decision Flow for the full runtime model.
Swarm / Peer Handoff
TODO — This pattern is not yet supported. The section below describes the target behavior and current gap.
Agents operate as peers without a central coordinator. Each agent decides when to hand off to another. The handoff transfers full control: Agent A exits, Agent B takes over the same thread with shared message history.
[Alice] <-> [Bob] <-> [Charlie]
(math) (code) (writing)
Unlike Coordinator, there is no parent-child hierarchy. The active agent switches identity, not spawns a child.
Why delegation does not work for handoff
Tirea’s current agent_run model is hierarchical (parent spawns child). Using it for handoff has fundamental issues:
- Nesting depth grows per hop. A→B→C means C runs as a grandchild. Each hop adds a level of indirection and resource consumption.
- No shared message history. Each child agent gets its own thread. The handoff target cannot see the prior conversation.
- Parent does not exit. The parent agent remains alive waiting for the child to finish, consuming context window.
A true peer handoff requires a different mechanism: the runtime switches the active agent’s identity (system prompt, tools, model) on the same thread within the same run. This is not yet implemented.
When to Use Multi-Agent
Split into multiple agents when:
- Stages need different system prompts, tools, or models
- A single context window cannot hold all required information
- Parallel execution provides meaningful speedup
- An independent critic role improves output quality over self-critique
- Isolation is needed (each agent sees only its allowed tools and data)
Stay with a single agent when:
- One agent with a few tools handles the full task
- The overhead of delegation (extra LLM calls, latency) outweighs the benefit
- The task is simple enough that prompt instructions suffice
See Also
- Sub-Agent Delegation for the runtime model behind delegation tools
- Use Sub-Agent Delegation for setup instructions
- HITL and Decision Flow for suspension mechanics
- Architecture for the three-layer runtime model