LLM Backends¶
Claw SDK separates the agent graph from the LLM provider. You define agents, edges, and artifacts declaratively; the LLM backend is a pluggable dependency injected into the runtime at execution time.
This guide covers:
- The
LLMprotocol that all backends implement LiteLLMBackendfor production use with 100+ providersauto_detect_llm()for zero-config setup from environment variablesMockLLMfor deterministic testing without API keys- The
PromptandLLMResponsedata models - Multi-turn conversation support for the ReAct tool loop
- Retry behavior in the runtime
The LLM Protocol¶
Every LLM backend implements a single async method:
from claw import LLM, Prompt, LLMResponse
class LLM(Protocol):
async def complete(self, prompt: Prompt, *, model: str = "") -> LLMResponse: ...
The model keyword argument allows per-call model overrides. Backends may use it to route to a specific model or ignore it if they are pre-configured.
Because LLM is a typing.Protocol decorated with @runtime_checkable, you can verify that a custom backend conforms at runtime:
LiteLLMBackend¶
LiteLLMBackend is the production backend. It wraps LiteLLM to provide a single class that works with any LLM provider -- Anthropic, Google, OpenAI, and 100+ others -- via model string prefixes.
Basic usage¶
from claw import LiteLLMBackend
# Anthropic Claude
llm = LiteLLMBackend(default_model="anthropic/claude-sonnet-4-20250514")
# Google Gemini
llm = LiteLLMBackend(default_model="gemini/gemini-2.5-flash")
# OpenAI
llm = LiteLLMBackend(default_model="openai/gpt-4o")
Constructor parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
default_model |
str |
"anthropic/claude-sonnet-4-20250514" |
Default model string when none is specified per-call |
temperature |
float |
0.0 |
Sampling temperature (0.0 = deterministic) |
max_tokens |
int |
4096 |
Maximum tokens in the response |
Environment variables¶
API keys are read from environment variables automatically by LiteLLM. Set the variable for the provider(s) you use:
| Provider | Variable | Example |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY |
sk-ant-... |
GOOGLE_API_KEY or GEMINI_API_KEY |
AI... |
|
| OpenAI | OPENAI_API_KEY |
sk-... |
Model strings¶
Model strings follow the LiteLLM convention: provider/model-name. Examples:
anthropic/claude-sonnet-4-20250514anthropic/claude-opus-4-0-20250514gemini/gemini-2.5-flashopenai/gpt-4o
See the LiteLLM model list for the full set of supported providers and model strings.
Per-agent models¶
Each agent in a society can specify its own model string. The runtime passes this to the backend's complete() method via the model keyword argument. The LiteLLMBackend.default_model is used when an agent's model field is not a valid LiteLLM model string.
from claw import Agent, Society, LiteLLMBackend, LocalRuntime
# Each agent can use a different model
pm = Agent(name="pm", role="manager", model="anthropic/claude-sonnet-4-20250514")
coder = Agent(name="coder", role="implementer", model="anthropic/claude-opus-4-0-20250514")
society = Society("mixed-models")
society.add(pm, coder)
# The backend's default_model is the fallback
llm = LiteLLMBackend(default_model="anthropic/claude-sonnet-4-20250514")
runtime = LocalRuntime(llm)
Auto-Detection¶
auto_detect_llm() is a convenience function that creates a LiteLLMBackend by scanning your environment variables. It is the fastest way to go from zero to a working LLM backend -- no model strings, no provider selection.
Basic usage¶
Detection order¶
The function checks environment variables in this order and uses the first match:
| Priority | Environment variable | Model selected |
|---|---|---|
| 1 | ANTHROPIC_API_KEY |
anthropic/claude-sonnet-4-20250514 |
| 2 | GEMINI_API_KEY or GOOGLE_API_KEY |
gemini/gemini-2.5-flash |
| 3 | OPENAI_API_KEY |
openai/gpt-4o |
The LITELLM_MODEL environment variable acts as a top-priority override. When set, it is used as the model string directly, bypassing the detection order entirely. This is useful for non-standard providers or pinning a specific model:
If no LITELLM_MODEL is set and no API key is found, auto_detect_llm() raises RuntimeError with a message listing the expected variables.
Parameters¶
auto_detect_llm() accepts the same tuning parameters as LiteLLMBackend:
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature |
float |
0.0 |
Sampling temperature |
max_tokens |
int |
4096 |
Maximum tokens in the response |
When to use auto_detect_llm() vs explicit LiteLLMBackend¶
Use auto_detect_llm() when:
- You want the quickest path to a working society (quickstarts, demos, examples)
- You don't care which specific model is used
- You want your code to work across machines with different providers configured
Use LiteLLMBackend directly when:
- You need a specific model (e.g., claude-opus-4-0-20250514 for complex reasoning)
- You need per-agent model overrides (see Per-agent models)
- You are deploying to production and want explicit control over the model
Full example¶
import asyncio
from claw import Agent, Society, Delegation, LocalRuntime, auto_detect_llm
pm = Agent(name="pm", role="project manager")
coder = Agent(name="coder", role="implementer")
s = Society(name="quick-demo")
s.connect(pm, coder, Delegation())
llm = auto_detect_llm()
async def main():
runtime = LocalRuntime(llm)
result = await runtime.run(s, "Write hello.py")
print(result.status)
asyncio.run(main())
MockLLM¶
MockLLM is the testing backend. It returns deterministic, scripted responses without making any API calls. Use it in unit tests, integration tests, and local development.
Basic usage¶
from claw import MockLLM, LLMResponse, ToolCall
llm = MockLLM()
# Script responses keyed by system prompt pattern
llm.script("You are pm,", responses=[
LLMResponse(tool_calls=[
ToolCall(name="emit_event", arguments={
"event_type": "task_delegated",
"target_agent": "coder",
"data": {"task": "Write hello.py"},
})
]),
LLMResponse(content="acknowledged"),
])
Pattern matching¶
Patterns match against the compiled system prompt using substring matching first, then regex. Since compiled prompts start with "You are agent <name>, ...", you can use the agent name as a pattern to uniquely identify agents:
llm.script("dev", responses=[LLMResponse(content="I'll implement this")])
llm.script("reviewer", responses=[LLMResponse(content="LGTM")])
You can also match on the event description using the event_pattern keyword:
llm.script(
"reviewer",
event_pattern="review_requested",
responses=[
LLMResponse(tool_calls=[ToolCall(name="approve", arguments={"comment": "LGTM"})]),
],
)
When both agent_pattern and event_pattern are provided, the more specific match wins. Matching priority:
- Agent + event pattern match (most specific)
- Event pattern only
- Agent pattern only
- Default fallback
Sequential responses¶
Each call to a matching pattern consumes the next response in the list. When the list is exhausted, the last response repeats:
llm = MockLLM()
llm.script("dev", responses=[
LLMResponse(content="first"),
LLMResponse(content="second"),
LLMResponse(content="third"),
])
prompt = Prompt(system="You are agent dev")
r1 = await llm.complete(prompt) # "first"
r2 = await llm.complete(prompt) # "second"
r3 = await llm.complete(prompt) # "third"
r4 = await llm.complete(prompt) # "third" (repeats last)
Default response¶
When no script matches, MockLLM returns a default response. The default is LLMResponse(content="acknowledged"), but you can customize it:
Call recording¶
MockLLM always records every call. Use this to inspect what the runtime sent to the LLM:
llm = MockLLM()
# ... run the society ...
# Inspect all recorded calls
for call in llm.calls:
print(call["system"][:50]) # system prompt
print(call["event"]) # event description
print(call["model"]) # model string passed by runtime
# Total call count
print(llm.call_count)
# Filter calls by agent
dev_calls = llm.calls_for("dev")
reviewer_calls = llm.calls_for("reviewer")
Reset¶
Reset call history and sequence counters between tests:
After a reset, scripted sequences start over from the first response.
Prompt Structure¶
The Prompt dataclass is the structured request format that all backends receive. The Claw compiler builds prompts from the society graph, and the runtime passes them to the LLM backend.
from claw import Prompt, ContextBlock, ContextType, Message
prompt = Prompt(
system="You are a code reviewer...",
context=[
ContextBlock(type=ContextType.ARTIFACT, content="...file contents..."),
ContextBlock(type=ContextType.INSTRUCTION, content="Be concise."),
],
event="Review this PR submission.",
tools=[{"name": "approve", "description": "Approve the PR", "parameters": {}}],
messages=[], # Multi-turn conversation history
)
Fields¶
| Field | Type | Purpose |
|---|---|---|
system |
str |
Compiled system prompt (generated by the compiler from the society graph) |
context |
list[ContextBlock] |
Artifact content, instructions, edge summaries, event history |
event |
str |
The triggering event description and data |
tools |
list[dict] |
Available tool schemas in OpenAI function-calling format |
messages |
list[Message] |
Multi-turn conversation history (used by the ReAct loop) |
Context types¶
Each ContextBlock is tagged with a ContextType for traceability:
| ContextType | Description |
|---|---|
ARTIFACT |
Content from a versioned artifact |
EVENT_HISTORY |
Previous events visible to this agent |
EDGE_SUMMARY |
Summary of the agent's relationships |
INSTRUCTION |
Agent-specific or edge-specific instructions |
The optional source field on ContextBlock records where the context came from (e.g., the artifact name or edge ID).
LLMResponse¶
The LLMResponse dataclass is what backends return:
from claw import LLMResponse, ToolCall, Usage
response = LLMResponse(
content="I've reviewed the code and it looks good.",
tool_calls=[
ToolCall(name="approve", arguments={"comment": "LGTM"}, id="call_1"),
],
usage=Usage(input_tokens=1500, output_tokens=200),
)
Fields¶
| Field | Type | Description |
|---|---|---|
content |
str |
Text response from the LLM |
tool_calls |
list[ToolCall] |
Structured tool invocations |
usage |
Usage |
Token usage statistics (input_tokens, output_tokens) |
Convenience properties response.input_tokens and response.output_tokens are available as shortcuts for response.usage.input_tokens and response.usage.output_tokens.
Multi-Turn Conversation¶
The ReAct tool loop uses Message objects to feed tool results back to the LLM across multiple turns. The runtime manages this automatically, but understanding the structure is useful for testing and debugging.
from claw import Message, ToolCall
messages = [
Message(role="assistant", content="Let me edit the file.", tool_calls=[
ToolCall(
name="file_edit",
arguments={"action": "write", "path": "hello.py", "content": "print('hello')"},
id="call_1",
),
]),
Message(role="tool", content="File written: hello.py", tool_call_id="call_1"),
]
Message roles¶
| Role | When | Key fields |
|---|---|---|
"assistant" |
LLM response with tool calls | content, tool_calls |
"tool" |
Tool execution result | content, tool_call_id |
"user" |
User input (rare in agent loops) | content |
The tool_call_id field on tool-result messages links the result back to the specific ToolCall that produced it, using the id field from the ToolCall.
Retry Behavior¶
LocalRuntime includes app-level retry logic for empty LLM responses:
- Max retries: 3 attempts
- Backoff schedule: 1s, 2s, 4s (exponential)
- Trigger: Response has no content and no tool calls
HTTP-level retries (rate limits, transient errors) are handled by LiteLLM internally and are separate from this app-level retry.
Writing a Custom Backend¶
To implement a custom backend, create a class with an async complete method matching the LLM protocol:
from claw import LLM, Prompt, LLMResponse
class MyCustomBackend:
async def complete(self, prompt: Prompt, *, model: str = "") -> LLMResponse:
# Your implementation here
text = await my_api_call(prompt.system, prompt.event)
return LLMResponse(content=text)
# Verify it conforms to the protocol
assert isinstance(MyCustomBackend(), LLM)
Then pass it to the runtime:
No registration step is needed. Any object that implements complete(prompt, *, model) -> LLMResponse works.
Quick Reference¶
| Backend | Use case | API keys required |
|---|---|---|
auto_detect_llm() |
Quickstarts, demos, examples | Yes (auto-detected from env) |
LiteLLMBackend |
Production, explicit model control | Yes (provider-specific env vars) |
MockLLM |
Unit tests, integration tests | No |
| Custom class | Special providers, local models | Depends on implementation |
Imports: