LLM Backends¶

Claw SDK separates the agent graph from the LLM provider. You define agents, edges, and artifacts declaratively; the LLM backend is a pluggable dependency injected into the runtime at execution time.

This guide covers:

The LLM protocol that all backends implement
LiteLLMBackend for production use with 100+ providers
auto_detect_llm() for zero-config setup from environment variables
MockLLM for deterministic testing without API keys
The Prompt and LLMResponse data models
Multi-turn conversation support for the ReAct tool loop
Retry behavior in the runtime

The LLM Protocol¶

Every LLM backend implements a single async method:

from claw import LLM, Prompt, LLMResponse

class LLM(Protocol):
    async def complete(self, prompt: Prompt, *, model: str = "") -> LLMResponse: ...

The model keyword argument allows per-call model overrides. Backends may use it to route to a specific model or ignore it if they are pre-configured.

Because LLM is a typing.Protocol decorated with @runtime_checkable, you can verify that a custom backend conforms at runtime:

assert isinstance(my_backend, LLM)

LiteLLMBackend¶

LiteLLMBackend is the production backend. It wraps LiteLLM to provide a single class that works with any LLM provider -- Anthropic, Google, OpenAI, and 100+ others -- via model string prefixes.

Basic usage¶

from claw import LiteLLMBackend

# Anthropic Claude
llm = LiteLLMBackend(default_model="anthropic/claude-sonnet-4-20250514")

# Google Gemini
llm = LiteLLMBackend(default_model="gemini/gemini-2.5-flash")

# OpenAI
llm = LiteLLMBackend(default_model="openai/gpt-4o")

Constructor parameters¶

Parameter	Type	Default	Description
`default_model`	`str`	`"anthropic/claude-sonnet-4-20250514"`	Default model string when none is specified per-call
`temperature`	`float`	`0.0`	Sampling temperature (0.0 = deterministic)
`max_tokens`	`int`	`4096`	Maximum tokens in the response

Environment variables¶

API keys are read from environment variables automatically by LiteLLM. Set the variable for the provider(s) you use:

Provider	Variable	Example
Anthropic	`ANTHROPIC_API_KEY`	`sk-ant-...`
Google	`GOOGLE_API_KEY` or `GEMINI_API_KEY`	`AI...`
OpenAI	`OPENAI_API_KEY`	`sk-...`

export ANTHROPIC_API_KEY="sk-ant-..."

Model strings¶

Model strings follow the LiteLLM convention: provider/model-name. Examples:

anthropic/claude-sonnet-4-20250514
anthropic/claude-opus-4-0-20250514
gemini/gemini-2.5-flash
openai/gpt-4o

See the LiteLLM model list for the full set of supported providers and model strings.

Per-agent models¶

Each agent in a society can specify its own model string. The runtime passes this to the backend's complete() method via the model keyword argument. The LiteLLMBackend.default_model is used when an agent's model field is not a valid LiteLLM model string.

from claw import Agent, Society, LiteLLMBackend, LocalRuntime

# Each agent can use a different model
pm = Agent(name="pm", role="manager", model="anthropic/claude-sonnet-4-20250514")
coder = Agent(name="coder", role="implementer", model="anthropic/claude-opus-4-0-20250514")

society = Society("mixed-models")
society.add(pm, coder)

# The backend's default_model is the fallback
llm = LiteLLMBackend(default_model="anthropic/claude-sonnet-4-20250514")
runtime = LocalRuntime(llm)

Auto-Detection¶

auto_detect_llm() is a convenience function that creates a LiteLLMBackend by scanning your environment variables. It is the fastest way to go from zero to a working LLM backend -- no model strings, no provider selection.

Basic usage¶

from claw import auto_detect_llm

llm = auto_detect_llm()  # Just works if any API key is set

Detection order¶

The function checks environment variables in this order and uses the first match:

Priority	Environment variable	Model selected
1	`ANTHROPIC_API_KEY`	`anthropic/claude-sonnet-4-20250514`
2	`GEMINI_API_KEY` or `GOOGLE_API_KEY`	`gemini/gemini-2.5-flash`
3	`OPENAI_API_KEY`	`openai/gpt-4o`

The LITELLM_MODEL environment variable acts as a top-priority override. When set, it is used as the model string directly, bypassing the detection order entirely. This is useful for non-standard providers or pinning a specific model:

export LITELLM_MODEL="anthropic/claude-opus-4-0-20250514"
export ANTHROPIC_API_KEY="sk-ant-..."

If no LITELLM_MODEL is set and no API key is found, auto_detect_llm() raises RuntimeError with a message listing the expected variables.

Parameters¶

auto_detect_llm() accepts the same tuning parameters as LiteLLMBackend:

llm = auto_detect_llm(temperature=0.7, max_tokens=8192)

Parameter	Type	Default	Description
`temperature`	`float`	`0.0`	Sampling temperature
`max_tokens`	`int`	`4096`	Maximum tokens in the response

When to use `auto_detect_llm()` vs explicit `LiteLLMBackend`¶

Use auto_detect_llm() when: - You want the quickest path to a working society (quickstarts, demos, examples) - You don't care which specific model is used - You want your code to work across machines with different providers configured

Use LiteLLMBackend directly when: - You need a specific model (e.g., claude-opus-4-0-20250514 for complex reasoning) - You need per-agent model overrides (see Per-agent models) - You are deploying to production and want explicit control over the model

Full example¶

import asyncio
from claw import Agent, Society, Delegation, LocalRuntime, auto_detect_llm

pm = Agent(name="pm", role="project manager")
coder = Agent(name="coder", role="implementer")

s = Society(name="quick-demo")
s.connect(pm, coder, Delegation())

llm = auto_detect_llm()

async def main():
    runtime = LocalRuntime(llm)
    result = await runtime.run(s, "Write hello.py")
    print(result.status)

asyncio.run(main())

MockLLM¶

MockLLM is the testing backend. It returns deterministic, scripted responses without making any API calls. Use it in unit tests, integration tests, and local development.

Basic usage¶

from claw import MockLLM, LLMResponse, ToolCall

llm = MockLLM()

# Script responses keyed by system prompt pattern
llm.script("You are pm,", responses=[
    LLMResponse(tool_calls=[
        ToolCall(name="emit_event", arguments={
            "event_type": "task_delegated",
            "target_agent": "coder",
            "data": {"task": "Write hello.py"},
        })
    ]),
    LLMResponse(content="acknowledged"),
])

Pattern matching¶

Patterns match against the compiled system prompt using substring matching first, then regex. Since compiled prompts start with "You are agent <name>, ...", you can use the agent name as a pattern to uniquely identify agents:

llm.script("dev", responses=[LLMResponse(content="I'll implement this")])
llm.script("reviewer", responses=[LLMResponse(content="LGTM")])

You can also match on the event description using the event_pattern keyword:

llm.script(
    "reviewer",
    event_pattern="review_requested",
    responses=[
        LLMResponse(tool_calls=[ToolCall(name="approve", arguments={"comment": "LGTM"})]),
    ],
)

When both agent_pattern and event_pattern are provided, the more specific match wins. Matching priority:

Agent + event pattern match (most specific)
Event pattern only
Agent pattern only
Default fallback

Sequential responses¶

Each call to a matching pattern consumes the next response in the list. When the list is exhausted, the last response repeats:

llm = MockLLM()
llm.script("dev", responses=[
    LLMResponse(content="first"),
    LLMResponse(content="second"),
    LLMResponse(content="third"),
])

prompt = Prompt(system="You are agent dev")
r1 = await llm.complete(prompt)  # "first"
r2 = await llm.complete(prompt)  # "second"
r3 = await llm.complete(prompt)  # "third"
r4 = await llm.complete(prompt)  # "third" (repeats last)

Default response¶

When no script matches, MockLLM returns a default response. The default is LLMResponse(content="acknowledged"), but you can customize it:

llm = MockLLM(default=LLMResponse(content="custom default"))

Call recording¶

MockLLM always records every call. Use this to inspect what the runtime sent to the LLM:

llm = MockLLM()

# ... run the society ...

# Inspect all recorded calls
for call in llm.calls:
    print(call["system"][:50])  # system prompt
    print(call["event"])         # event description
    print(call["model"])         # model string passed by runtime

# Total call count
print(llm.call_count)

# Filter calls by agent
dev_calls = llm.calls_for("dev")
reviewer_calls = llm.calls_for("reviewer")

Reset¶

Reset call history and sequence counters between tests:

llm.reset()
assert llm.call_count == 0

After a reset, scripted sequences start over from the first response.

Prompt Structure¶

The Prompt dataclass is the structured request format that all backends receive. The Claw compiler builds prompts from the society graph, and the runtime passes them to the LLM backend.

from claw import Prompt, ContextBlock, ContextType, Message

prompt = Prompt(
    system="You are a code reviewer...",
    context=[
        ContextBlock(type=ContextType.ARTIFACT, content="...file contents..."),
        ContextBlock(type=ContextType.INSTRUCTION, content="Be concise."),
    ],
    event="Review this PR submission.",
    tools=[{"name": "approve", "description": "Approve the PR", "parameters": {}}],
    messages=[],  # Multi-turn conversation history
)

Fields¶

Field	Type	Purpose
`system`	`str`	Compiled system prompt (generated by the compiler from the society graph)
`context`	`list[ContextBlock]`	Artifact content, instructions, edge summaries, event history
`event`	`str`	The triggering event description and data
`tools`	`list[dict]`	Available tool schemas in OpenAI function-calling format
`messages`	`list[Message]`	Multi-turn conversation history (used by the ReAct loop)

Context types¶

Each ContextBlock is tagged with a ContextType for traceability:

ContextType	Description
`ARTIFACT`	Content from a versioned artifact
`EVENT_HISTORY`	Previous events visible to this agent
`EDGE_SUMMARY`	Summary of the agent's relationships
`INSTRUCTION`	Agent-specific or edge-specific instructions

The optional source field on ContextBlock records where the context came from (e.g., the artifact name or edge ID).

LLMResponse¶

The LLMResponse dataclass is what backends return:

from claw import LLMResponse, ToolCall, Usage

response = LLMResponse(
    content="I've reviewed the code and it looks good.",
    tool_calls=[
        ToolCall(name="approve", arguments={"comment": "LGTM"}, id="call_1"),
    ],
    usage=Usage(input_tokens=1500, output_tokens=200),
)

Fields¶

Field	Type	Description
`content`	`str`	Text response from the LLM
`tool_calls`	`list[ToolCall]`	Structured tool invocations
`usage`	`Usage`	Token usage statistics (`input_tokens`, `output_tokens`)

Convenience properties response.input_tokens and response.output_tokens are available as shortcuts for response.usage.input_tokens and response.usage.output_tokens.

Multi-Turn Conversation¶

The ReAct tool loop uses Message objects to feed tool results back to the LLM across multiple turns. The runtime manages this automatically, but understanding the structure is useful for testing and debugging.

from claw import Message, ToolCall

messages = [
    Message(role="assistant", content="Let me edit the file.", tool_calls=[
        ToolCall(
            name="file_edit",
            arguments={"action": "write", "path": "hello.py", "content": "print('hello')"},
            id="call_1",
        ),
    ]),
    Message(role="tool", content="File written: hello.py", tool_call_id="call_1"),
]

Message roles¶

Role	When	Key fields
`"assistant"`	LLM response with tool calls	`content`, `tool_calls`
`"tool"`	Tool execution result	`content`, `tool_call_id`
`"user"`	User input (rare in agent loops)	`content`

The tool_call_id field on tool-result messages links the result back to the specific ToolCall that produced it, using the id field from the ToolCall.

Retry Behavior¶

LocalRuntime includes app-level retry logic for empty LLM responses:

Max retries: 3 attempts
Backoff schedule: 1s, 2s, 4s (exponential)
Trigger: Response has no content and no tool calls

HTTP-level retries (rate limits, transient errors) are handled by LiteLLM internally and are separate from this app-level retry.

Writing a Custom Backend¶

To implement a custom backend, create a class with an async complete method matching the LLM protocol:

from claw import LLM, Prompt, LLMResponse

class MyCustomBackend:
    async def complete(self, prompt: Prompt, *, model: str = "") -> LLMResponse:
        # Your implementation here
        text = await my_api_call(prompt.system, prompt.event)
        return LLMResponse(content=text)

# Verify it conforms to the protocol
assert isinstance(MyCustomBackend(), LLM)

Then pass it to the runtime:

from claw import LocalRuntime

runtime = LocalRuntime(MyCustomBackend())

No registration step is needed. Any object that implements complete(prompt, *, model) -> LLMResponse works.

Quick Reference¶

Backend	Use case	API keys required
`auto_detect_llm()`	Quickstarts, demos, examples	Yes (auto-detected from env)
`LiteLLMBackend`	Production, explicit model control	Yes (provider-specific env vars)
`MockLLM`	Unit tests, integration tests	No
Custom class	Special providers, local models	Depends on implementation

Imports:

# Core protocol and data models
from claw import LLM, Prompt, LLMResponse, ToolCall, Message, Usage
from claw import ContextBlock, ContextType

# Backends
from claw import LiteLLMBackend, MockLLM, auto_detect_llm

LLM Backends¶

The LLM Protocol¶

LiteLLMBackend¶

Basic usage¶

Constructor parameters¶

Environment variables¶

Model strings¶

Per-agent models¶

Auto-Detection¶

Basic usage¶

Detection order¶

Parameters¶

When to use auto_detect_llm() vs explicit LiteLLMBackend¶

Full example¶

MockLLM¶

Basic usage¶

Pattern matching¶

Sequential responses¶

Default response¶

Call recording¶

Reset¶

Prompt Structure¶

Fields¶

Context types¶

LLMResponse¶

Fields¶

Multi-Turn Conversation¶

Message roles¶

Retry Behavior¶

Writing a Custom Backend¶

Quick Reference¶

When to use `auto_detect_llm()` vs explicit `LiteLLMBackend`¶