Skip to content

LLM Backends

Claw SDK separates the agent graph from the LLM provider. You define agents, edges, and artifacts declaratively; the LLM backend is a pluggable dependency injected into the runtime at execution time.

This guide covers:

  • The LLM protocol that all backends implement
  • LiteLLMBackend for production use with 100+ providers
  • auto_detect_llm() for zero-config setup from environment variables
  • MockLLM for deterministic testing without API keys
  • The Prompt and LLMResponse data models
  • Multi-turn conversation support for the ReAct tool loop
  • Retry behavior in the runtime

The LLM Protocol

Every LLM backend implements a single async method:

from claw import LLM, Prompt, LLMResponse

class LLM(Protocol):
    async def complete(self, prompt: Prompt, *, model: str = "") -> LLMResponse: ...

The model keyword argument allows per-call model overrides. Backends may use it to route to a specific model or ignore it if they are pre-configured.

Because LLM is a typing.Protocol decorated with @runtime_checkable, you can verify that a custom backend conforms at runtime:

assert isinstance(my_backend, LLM)

LiteLLMBackend

LiteLLMBackend is the production backend. It wraps LiteLLM to provide a single class that works with any LLM provider -- Anthropic, Google, OpenAI, and 100+ others -- via model string prefixes.

Basic usage

from claw import LiteLLMBackend

# Anthropic Claude
llm = LiteLLMBackend(default_model="anthropic/claude-sonnet-4-20250514")

# Google Gemini
llm = LiteLLMBackend(default_model="gemini/gemini-2.5-flash")

# OpenAI
llm = LiteLLMBackend(default_model="openai/gpt-4o")

Constructor parameters

Parameter Type Default Description
default_model str "anthropic/claude-sonnet-4-20250514" Default model string when none is specified per-call
temperature float 0.0 Sampling temperature (0.0 = deterministic)
max_tokens int 4096 Maximum tokens in the response

Environment variables

API keys are read from environment variables automatically by LiteLLM. Set the variable for the provider(s) you use:

Provider Variable Example
Anthropic ANTHROPIC_API_KEY sk-ant-...
Google GOOGLE_API_KEY or GEMINI_API_KEY AI...
OpenAI OPENAI_API_KEY sk-...
export ANTHROPIC_API_KEY="sk-ant-..."

Model strings

Model strings follow the LiteLLM convention: provider/model-name. Examples:

  • anthropic/claude-sonnet-4-20250514
  • anthropic/claude-opus-4-0-20250514
  • gemini/gemini-2.5-flash
  • openai/gpt-4o

See the LiteLLM model list for the full set of supported providers and model strings.

Per-agent models

Each agent in a society can specify its own model string. The runtime passes this to the backend's complete() method via the model keyword argument. The LiteLLMBackend.default_model is used when an agent's model field is not a valid LiteLLM model string.

from claw import Agent, Society, LiteLLMBackend, LocalRuntime

# Each agent can use a different model
pm = Agent(name="pm", role="manager", model="anthropic/claude-sonnet-4-20250514")
coder = Agent(name="coder", role="implementer", model="anthropic/claude-opus-4-0-20250514")

society = Society("mixed-models")
society.add(pm, coder)

# The backend's default_model is the fallback
llm = LiteLLMBackend(default_model="anthropic/claude-sonnet-4-20250514")
runtime = LocalRuntime(llm)

Auto-Detection

auto_detect_llm() is a convenience function that creates a LiteLLMBackend by scanning your environment variables. It is the fastest way to go from zero to a working LLM backend -- no model strings, no provider selection.

Basic usage

from claw import auto_detect_llm

llm = auto_detect_llm()  # Just works if any API key is set

Detection order

The function checks environment variables in this order and uses the first match:

Priority Environment variable Model selected
1 ANTHROPIC_API_KEY anthropic/claude-sonnet-4-20250514
2 GEMINI_API_KEY or GOOGLE_API_KEY gemini/gemini-2.5-flash
3 OPENAI_API_KEY openai/gpt-4o

The LITELLM_MODEL environment variable acts as a top-priority override. When set, it is used as the model string directly, bypassing the detection order entirely. This is useful for non-standard providers or pinning a specific model:

export LITELLM_MODEL="anthropic/claude-opus-4-0-20250514"
export ANTHROPIC_API_KEY="sk-ant-..."

If no LITELLM_MODEL is set and no API key is found, auto_detect_llm() raises RuntimeError with a message listing the expected variables.

Parameters

auto_detect_llm() accepts the same tuning parameters as LiteLLMBackend:

llm = auto_detect_llm(temperature=0.7, max_tokens=8192)
Parameter Type Default Description
temperature float 0.0 Sampling temperature
max_tokens int 4096 Maximum tokens in the response

When to use auto_detect_llm() vs explicit LiteLLMBackend

Use auto_detect_llm() when: - You want the quickest path to a working society (quickstarts, demos, examples) - You don't care which specific model is used - You want your code to work across machines with different providers configured

Use LiteLLMBackend directly when: - You need a specific model (e.g., claude-opus-4-0-20250514 for complex reasoning) - You need per-agent model overrides (see Per-agent models) - You are deploying to production and want explicit control over the model

Full example

import asyncio
from claw import Agent, Society, Delegation, LocalRuntime, auto_detect_llm

pm = Agent(name="pm", role="project manager")
coder = Agent(name="coder", role="implementer")

s = Society(name="quick-demo")
s.connect(pm, coder, Delegation())

llm = auto_detect_llm()

async def main():
    runtime = LocalRuntime(llm)
    result = await runtime.run(s, "Write hello.py")
    print(result.status)

asyncio.run(main())

MockLLM

MockLLM is the testing backend. It returns deterministic, scripted responses without making any API calls. Use it in unit tests, integration tests, and local development.

Basic usage

from claw import MockLLM, LLMResponse, ToolCall

llm = MockLLM()

# Script responses keyed by system prompt pattern
llm.script("You are pm,", responses=[
    LLMResponse(tool_calls=[
        ToolCall(name="emit_event", arguments={
            "event_type": "task_delegated",
            "target_agent": "coder",
            "data": {"task": "Write hello.py"},
        })
    ]),
    LLMResponse(content="acknowledged"),
])

Pattern matching

Patterns match against the compiled system prompt using substring matching first, then regex. Since compiled prompts start with "You are agent <name>, ...", you can use the agent name as a pattern to uniquely identify agents:

llm.script("dev", responses=[LLMResponse(content="I'll implement this")])
llm.script("reviewer", responses=[LLMResponse(content="LGTM")])

You can also match on the event description using the event_pattern keyword:

llm.script(
    "reviewer",
    event_pattern="review_requested",
    responses=[
        LLMResponse(tool_calls=[ToolCall(name="approve", arguments={"comment": "LGTM"})]),
    ],
)

When both agent_pattern and event_pattern are provided, the more specific match wins. Matching priority:

  1. Agent + event pattern match (most specific)
  2. Event pattern only
  3. Agent pattern only
  4. Default fallback

Sequential responses

Each call to a matching pattern consumes the next response in the list. When the list is exhausted, the last response repeats:

llm = MockLLM()
llm.script("dev", responses=[
    LLMResponse(content="first"),
    LLMResponse(content="second"),
    LLMResponse(content="third"),
])

prompt = Prompt(system="You are agent dev")
r1 = await llm.complete(prompt)  # "first"
r2 = await llm.complete(prompt)  # "second"
r3 = await llm.complete(prompt)  # "third"
r4 = await llm.complete(prompt)  # "third" (repeats last)

Default response

When no script matches, MockLLM returns a default response. The default is LLMResponse(content="acknowledged"), but you can customize it:

llm = MockLLM(default=LLMResponse(content="custom default"))

Call recording

MockLLM always records every call. Use this to inspect what the runtime sent to the LLM:

llm = MockLLM()

# ... run the society ...

# Inspect all recorded calls
for call in llm.calls:
    print(call["system"][:50])  # system prompt
    print(call["event"])         # event description
    print(call["model"])         # model string passed by runtime

# Total call count
print(llm.call_count)

# Filter calls by agent
dev_calls = llm.calls_for("dev")
reviewer_calls = llm.calls_for("reviewer")

Reset

Reset call history and sequence counters between tests:

llm.reset()
assert llm.call_count == 0

After a reset, scripted sequences start over from the first response.


Prompt Structure

The Prompt dataclass is the structured request format that all backends receive. The Claw compiler builds prompts from the society graph, and the runtime passes them to the LLM backend.

from claw import Prompt, ContextBlock, ContextType, Message

prompt = Prompt(
    system="You are a code reviewer...",
    context=[
        ContextBlock(type=ContextType.ARTIFACT, content="...file contents..."),
        ContextBlock(type=ContextType.INSTRUCTION, content="Be concise."),
    ],
    event="Review this PR submission.",
    tools=[{"name": "approve", "description": "Approve the PR", "parameters": {}}],
    messages=[],  # Multi-turn conversation history
)

Fields

Field Type Purpose
system str Compiled system prompt (generated by the compiler from the society graph)
context list[ContextBlock] Artifact content, instructions, edge summaries, event history
event str The triggering event description and data
tools list[dict] Available tool schemas in OpenAI function-calling format
messages list[Message] Multi-turn conversation history (used by the ReAct loop)

Context types

Each ContextBlock is tagged with a ContextType for traceability:

ContextType Description
ARTIFACT Content from a versioned artifact
EVENT_HISTORY Previous events visible to this agent
EDGE_SUMMARY Summary of the agent's relationships
INSTRUCTION Agent-specific or edge-specific instructions

The optional source field on ContextBlock records where the context came from (e.g., the artifact name or edge ID).


LLMResponse

The LLMResponse dataclass is what backends return:

from claw import LLMResponse, ToolCall, Usage

response = LLMResponse(
    content="I've reviewed the code and it looks good.",
    tool_calls=[
        ToolCall(name="approve", arguments={"comment": "LGTM"}, id="call_1"),
    ],
    usage=Usage(input_tokens=1500, output_tokens=200),
)

Fields

Field Type Description
content str Text response from the LLM
tool_calls list[ToolCall] Structured tool invocations
usage Usage Token usage statistics (input_tokens, output_tokens)

Convenience properties response.input_tokens and response.output_tokens are available as shortcuts for response.usage.input_tokens and response.usage.output_tokens.


Multi-Turn Conversation

The ReAct tool loop uses Message objects to feed tool results back to the LLM across multiple turns. The runtime manages this automatically, but understanding the structure is useful for testing and debugging.

from claw import Message, ToolCall

messages = [
    Message(role="assistant", content="Let me edit the file.", tool_calls=[
        ToolCall(
            name="file_edit",
            arguments={"action": "write", "path": "hello.py", "content": "print('hello')"},
            id="call_1",
        ),
    ]),
    Message(role="tool", content="File written: hello.py", tool_call_id="call_1"),
]

Message roles

Role When Key fields
"assistant" LLM response with tool calls content, tool_calls
"tool" Tool execution result content, tool_call_id
"user" User input (rare in agent loops) content

The tool_call_id field on tool-result messages links the result back to the specific ToolCall that produced it, using the id field from the ToolCall.


Retry Behavior

LocalRuntime includes app-level retry logic for empty LLM responses:

  • Max retries: 3 attempts
  • Backoff schedule: 1s, 2s, 4s (exponential)
  • Trigger: Response has no content and no tool calls

HTTP-level retries (rate limits, transient errors) are handled by LiteLLM internally and are separate from this app-level retry.


Writing a Custom Backend

To implement a custom backend, create a class with an async complete method matching the LLM protocol:

from claw import LLM, Prompt, LLMResponse

class MyCustomBackend:
    async def complete(self, prompt: Prompt, *, model: str = "") -> LLMResponse:
        # Your implementation here
        text = await my_api_call(prompt.system, prompt.event)
        return LLMResponse(content=text)

# Verify it conforms to the protocol
assert isinstance(MyCustomBackend(), LLM)

Then pass it to the runtime:

from claw import LocalRuntime

runtime = LocalRuntime(MyCustomBackend())

No registration step is needed. Any object that implements complete(prompt, *, model) -> LLMResponse works.


Quick Reference

Backend Use case API keys required
auto_detect_llm() Quickstarts, demos, examples Yes (auto-detected from env)
LiteLLMBackend Production, explicit model control Yes (provider-specific env vars)
MockLLM Unit tests, integration tests No
Custom class Special providers, local models Depends on implementation

Imports:

# Core protocol and data models
from claw import LLM, Prompt, LLMResponse, ToolCall, Message, Usage
from claw import ContextBlock, ContextType

# Backends
from claw import LiteLLMBackend, MockLLM, auto_detect_llm