title: API Integration tags: [api, provider, web_request] updated: 2026-02-24
API Integration
Cowboy communicates with LLM providers through Zellij's WebAccess permission system. The provider layer formats requests and parses responses, while Zellij handles the actual HTTP transport asynchronously.
- Native
web_request()via Zellij's WebAccess permission (no curl needed) - Provider abstraction via Rust
LlmProvidertrait - Implementations for Claude, OpenAI
- Proxy-aware: when
secretsProxy.enable = true, the plugin sends dummy API keys; the agent-proxy injects real credentials on the wire (see [[Security]])
Architecture
The provider layer is split into formatting and parsing -- there is no synchronous complete() call. This matches Zellij's event-driven model:
format_request()produces(url, headers, body)forweb_request()- Zellij delivers
Event::WebRequestResultasynchronously parse_response()orparse_error()interprets the result
+------------------+ +------------------+ +------------------+
| AgentHarness | --> | LlmProvider | --> | web_request() |
| (send_to_llm) | | (format_request) | | (Zellij API) |
+------------------+ +------------------+ +------------------+
^ |
| +-------------+-------------+
WebRequestResult | |
| +----------------+ +----------------+
+---------- | ClaudeProvider | | OpenAIProvider |
+----------------+ +----------------+
When the secrets proxy is active, HTTPS traffic from the agent namespace is DNAT'd to the proxy on the host. The proxy injects real API keys based on domain mappings configured in services.agent.secretsProxy.domainMappings. The plugin only holds dummy keys (e.g., "PROXY_MANAGED").
LlmProvider Trait
The trait deliberately avoids async or Result-based complete(). Providers only know how to serialize requests and deserialize responses:
#![allow(unused)] fn main() { pub trait LlmProvider { fn name(&self) -> &str; fn format_request( &self, messages: &[Message], tools: &[Tool], system: &str, ) -> (String, BTreeMap<String, String>, Vec<u8>); fn parse_response(&self, body: &[u8]) -> Result<LlmResponse, ProviderError>; fn parse_error(&self, status: u16, body: &[u8]) -> ProviderError; } }
Message Model
Messages use string roles and structured ContentBlock vectors (not a Role enum or plain string content). This matches the Claude API's native content block model and supports text, tool_use, tool_result, and thinking blocks in a single message:
#![allow(unused)] fn main() { pub struct Message { pub role: String, // "user", "assistant" pub content: Vec<ContentBlock>, } pub struct ContentBlock { pub block_type: String, // "text", "tool_use", "tool_result", "thinking" pub text: Option<String>, pub thinking: Option<String>, pub id: Option<String>, pub tool_use_id: Option<String>, pub name: Option<String>, pub input: Option<Value>, pub is_error: Option<bool>, } }
Providers
ClaudeProvider
Targets Anthropic's Messages API (/v1/messages). Configuration is stored directly on the provider struct rather than passed per-request:
model: defaults toclaude-sonnet-4-20250514max_tokens: defaults to 8192thinking_enabled/thinking_budget: extended thinking support (enabled by default, 10k token budget)
OpenAIProvider
Supports both Chat Completions API and Responses API. The provider auto-selects the Responses API for reasoning models (o1, o3, o4, gpt-5, codex) to capture reasoning summaries. Reasoning summaries are mapped to thinking content blocks for unified handling.
model: defaults togpt-4obase_url: configurable for compatible APIsreasoning_effort: "low", "medium", or "high" for reasoning models
Request Flow in AgentHarness
send_to_llm() in llm.rs orchestrates the call:
- Acquires session lock (
try_acquire_llm) - Builds augmented system prompt with cached memories
- Converts internal
Messageto providerMessageviabuild_api_messages() - Calls
format_request()on the active provider - Issues
web_request()with contextcmd_ctx!("type" => "llm_request")
Response arrives via handle_llm_response() in handlers.rs:
- Releases LLM lock
- On non-200: calls
parse_error()and pushes error as assistant message - On 200: calls
parse_response(), extracts tool calls viaparse_tool_calls() - If tool calls present: queues them and begins execution
- If no tool calls and running as subagent: auto-submits response
- If no tool calls and pub/sub reply pending: sends reply
Error Handling
ProviderError variants include ParseError, ApiError, NetworkError, RateLimited, Timeout, InvalidRequest, AuthenticationError, and Overloaded. Each error knows if it is retryable via is_retryable().
A RetryState struct provides exponential backoff (1s initial, 30s max, 3 attempts), but retry orchestration at the harness level is minimal -- failed LLM calls currently surface as error messages rather than being automatically retried.
Retryable errors: rate limits (429), network errors, server errors (5xx), timeouts. Non-retryable: auth errors (401/403), client errors (4xx), parse errors.
Context Correlation
Request context uses BTreeMap<String, String> (via the cmd_ctx! macro), not a typed enum. The "type" key distinguishes request kinds: "llm_request", "exa_search", "summarization", "compaction".
Nix Configuration
services.agent.llm = {
provider = "claude"; # claude | openai | local
model = "claude-sonnet-4-20250514";
apiKeyPath = "/run/agenix/ai_key"; # ignored when secretsProxy.enable
};
When services.agent.secretsProxy.enable = true, the API key path is irrelevant -- the proxy handles credential injection.
Relevance to agent-pkgs
The provider abstraction is internal to cowboy. Bridge services in agent-pkgs (discord, email) interact with cowboy through the pub/sub source system, not through the LLM provider layer. Messages from bridge services arrive via poll_message_sources() and are processed as user input; responses flow back through send_pubsub_reply().
Status vs. Design Doc
The original design doc proposed:
- A synchronous-looking
complete()method -- not implemented; the actual trait usesformat_request()/parse_response()split RequestConfigstruct passed per-call -- not implemented; config lives on provider structsRoleenum for messages -- not implemented; string roles withVec<ContentBlock>ApiStatestate machine withpending_requestsHashMap -- not implemented; context correlation usesBTreeMapvia Zellij eventsLlmClientwrapper -- not implemented- Elaborate retry scheduling in the harness update loop -- partially implemented (
RetryStateexists, but auto-retry on LLM calls is not wired up)
Implemented but not in original design:
- Extended thinking support (Claude and OpenAI reasoning summaries)
- OpenAI Responses API for reasoning models
- Tool use as first-class content blocks
- Summarization pipeline using a separate LLM call
- Session locking around LLM calls
- Memory augmentation of system prompts
Related
- [[Security]] -- Secrets proxy and credential injection
- [[Context Engineering]] -- Prompt caching and token budget
- [[Nix Integration]] -- Provider configuration via Nix modules