title: API Integration tags: [api, provider, web_request] updated: 2026-02-24

API Integration

Cowboy communicates with LLM providers through Zellij's WebAccess permission system. The provider layer formats requests and parses responses, while Zellij handles the actual HTTP transport asynchronously.

  • Native web_request() via Zellij's WebAccess permission (no curl needed)
  • Provider abstraction via Rust LlmProvider trait
  • Implementations for Claude, OpenAI
  • Proxy-aware: when secretsProxy.enable = true, the plugin sends dummy API keys; the agent-proxy injects real credentials on the wire (see [[Security]])

Architecture

The provider layer is split into formatting and parsing -- there is no synchronous complete() call. This matches Zellij's event-driven model:

  1. format_request() produces (url, headers, body) for web_request()
  2. Zellij delivers Event::WebRequestResult asynchronously
  3. parse_response() or parse_error() interprets the result
+------------------+     +------------------+     +------------------+
|  AgentHarness    | --> |   LlmProvider    | --> |   web_request()  |
|  (send_to_llm)   |     | (format_request) |     |   (Zellij API)   |
+------------------+     +------------------+     +------------------+
        ^                       |
        |         +-------------+-------------+
   WebRequestResult             |                     |
        |           +----------------+    +----------------+
        +---------- | ClaudeProvider |    | OpenAIProvider |
                    +----------------+    +----------------+

When the secrets proxy is active, HTTPS traffic from the agent namespace is DNAT'd to the proxy on the host. The proxy injects real API keys based on domain mappings configured in services.agent.secretsProxy.domainMappings. The plugin only holds dummy keys (e.g., "PROXY_MANAGED").

LlmProvider Trait

The trait deliberately avoids async or Result-based complete(). Providers only know how to serialize requests and deserialize responses:

#![allow(unused)]
fn main() {
pub trait LlmProvider {
    fn name(&self) -> &str;

    fn format_request(
        &self,
        messages: &[Message],
        tools: &[Tool],
        system: &str,
    ) -> (String, BTreeMap<String, String>, Vec<u8>);

    fn parse_response(&self, body: &[u8]) -> Result<LlmResponse, ProviderError>;

    fn parse_error(&self, status: u16, body: &[u8]) -> ProviderError;
}
}

Message Model

Messages use string roles and structured ContentBlock vectors (not a Role enum or plain string content). This matches the Claude API's native content block model and supports text, tool_use, tool_result, and thinking blocks in a single message:

#![allow(unused)]
fn main() {
pub struct Message {
    pub role: String,           // "user", "assistant"
    pub content: Vec<ContentBlock>,
}

pub struct ContentBlock {
    pub block_type: String,     // "text", "tool_use", "tool_result", "thinking"
    pub text: Option<String>,
    pub thinking: Option<String>,
    pub id: Option<String>,
    pub tool_use_id: Option<String>,
    pub name: Option<String>,
    pub input: Option<Value>,
    pub is_error: Option<bool>,
}
}

Providers

ClaudeProvider

Targets Anthropic's Messages API (/v1/messages). Configuration is stored directly on the provider struct rather than passed per-request:

  • model: defaults to claude-sonnet-4-20250514
  • max_tokens: defaults to 8192
  • thinking_enabled / thinking_budget: extended thinking support (enabled by default, 10k token budget)

OpenAIProvider

Supports both Chat Completions API and Responses API. The provider auto-selects the Responses API for reasoning models (o1, o3, o4, gpt-5, codex) to capture reasoning summaries. Reasoning summaries are mapped to thinking content blocks for unified handling.

  • model: defaults to gpt-4o
  • base_url: configurable for compatible APIs
  • reasoning_effort: "low", "medium", or "high" for reasoning models

Request Flow in AgentHarness

send_to_llm() in llm.rs orchestrates the call:

  1. Acquires session lock (try_acquire_llm)
  2. Builds augmented system prompt with cached memories
  3. Converts internal Message to provider Message via build_api_messages()
  4. Calls format_request() on the active provider
  5. Issues web_request() with context cmd_ctx!("type" => "llm_request")

Response arrives via handle_llm_response() in handlers.rs:

  1. Releases LLM lock
  2. On non-200: calls parse_error() and pushes error as assistant message
  3. On 200: calls parse_response(), extracts tool calls via parse_tool_calls()
  4. If tool calls present: queues them and begins execution
  5. If no tool calls and running as subagent: auto-submits response
  6. If no tool calls and pub/sub reply pending: sends reply

Error Handling

ProviderError variants include ParseError, ApiError, NetworkError, RateLimited, Timeout, InvalidRequest, AuthenticationError, and Overloaded. Each error knows if it is retryable via is_retryable().

A RetryState struct provides exponential backoff (1s initial, 30s max, 3 attempts), but retry orchestration at the harness level is minimal -- failed LLM calls currently surface as error messages rather than being automatically retried.

Retryable errors: rate limits (429), network errors, server errors (5xx), timeouts. Non-retryable: auth errors (401/403), client errors (4xx), parse errors.

Context Correlation

Request context uses BTreeMap<String, String> (via the cmd_ctx! macro), not a typed enum. The "type" key distinguishes request kinds: "llm_request", "exa_search", "summarization", "compaction".

Nix Configuration

services.agent.llm = {
  provider = "claude";  # claude | openai | local
  model = "claude-sonnet-4-20250514";
  apiKeyPath = "/run/agenix/ai_key";  # ignored when secretsProxy.enable
};

When services.agent.secretsProxy.enable = true, the API key path is irrelevant -- the proxy handles credential injection.

Relevance to agent-pkgs

The provider abstraction is internal to cowboy. Bridge services in agent-pkgs (discord, email) interact with cowboy through the pub/sub source system, not through the LLM provider layer. Messages from bridge services arrive via poll_message_sources() and are processed as user input; responses flow back through send_pubsub_reply().

Status vs. Design Doc

The original design doc proposed:

  • A synchronous-looking complete() method -- not implemented; the actual trait uses format_request() / parse_response() split
  • RequestConfig struct passed per-call -- not implemented; config lives on provider structs
  • Role enum for messages -- not implemented; string roles with Vec<ContentBlock>
  • ApiState state machine with pending_requests HashMap -- not implemented; context correlation uses BTreeMap via Zellij events
  • LlmClient wrapper -- not implemented
  • Elaborate retry scheduling in the harness update loop -- partially implemented (RetryState exists, but auto-retry on LLM calls is not wired up)

Implemented but not in original design:

  • Extended thinking support (Claude and OpenAI reasoning summaries)
  • OpenAI Responses API for reasoning models
  • Tool use as first-class content blocks
  • Summarization pipeline using a separate LLM call
  • Session locking around LLM calls
  • Memory augmentation of system prompts
  • [[Security]] -- Secrets proxy and credential injection
  • [[Context Engineering]] -- Prompt caching and token budget
  • [[Nix Integration]] -- Provider configuration via Nix modules