The technical map of OpenClaw

OpenClaw’s fieldguide is written for the user who wants to know how to operate the system. For the engineer who wants to know how the system works, a different reading is needed. What follows is a structured walk through the building systems that make OpenClaw what it is. Not an API reference, but a story with headings that makes the mechanics visible.

01 · THE GATEWAY IN DETAILProvider independent messaging abstraction

The Gateway is one Node.js process that orchestrates everything. WebSocket and HTTP combined, all channels managed, agent runtime inside it. The gateway is what makes the claw visible to the outside world.

Channel adapters. Every messaging platform has its own protocol, its own authentication flow, its own rate limits. The Gateway uses an adapter per channel that normalises the external protocol into an internal canonical form. An incoming WhatsApp message via Baileys, a Telegram update via Bot API, a Discord event via their gateway WebSocket; all three end up in the same internal shape: {channelId, peerId, content, timestamp, attachments, metadata}.

Inbound canonicalisation. When an external event arrives, it walks a pipeline: protocol decode (provider specific), filter (DM policy, allowlist, mention detection), enrichment (peer lookup, channel config), routing (which agent receives this). The result is an InboundMessage that the runtime picks up.

Outbound de canonicalisation. The reverse route. The agent produces a response in canonical form; the channel adapter translates back to provider specific protocol. Long responses are split to channel limits (Telegram 4096 chars, WhatsApp different). Streaming is offered where the channel supports it.

State per channel. The gateway keeps state per channel: connection status, last seen message id, rate limit budget. On network failure or provider outage the system responds with exponential backoff and automatic reconnection. Outbound messages get a durable lifecycle (queued, sending, sent, failed) so a gateway restart does not lose messages.

Error handling per provider. Discord and Telegram have official Bot APIs with predictable error codes. WhatsApp via Baileys is more fragile; session expiry and pairing resets require explicit retry paths. iMessage via osascript has platform specific quirks.

02 · THE RUNTIME AND ITS STATEHow memory coherence between turns works

The Agent Runtime lives inside the gateway. For each incoming message the router picks which agent receives it; the runtime then takes over.

Session state. A session is tied to a (channel, peer) combination. The runtime keeps per session: active LLM context (system prompt plus message history), available tools, memory handles, running tool calls. At each turn this state is mutated; between turns it is serialised to disk.

Compaction. When a session approaches the context window limit, automatic compaction kicks in: important facts are written to memory/YYYY-MM-DD.md, older turns get summarised, and the active context stays manageable. Compaction checkpoints (configurable via sessions.compactionCheckpointOptions) keep snapshots so rollback remains possible.

Soul and Memory in code. SOUL.md, USER.md, AGENTS.md, HEARTBEAT.md are mounted into the system prompt at each turn. MEMORY.md is included up to 20,000 characters, after which truncation takes place with a marker. Daily notes in memory/ are not in the prompt automatically; the agent can request them via the memory_search tool.

Restart resilience. Sessions are persisted to disk; on a gateway restart the runtime can restore an active session. The command:new event triggers a new session; /reset or /new triggers the same plus a memory flush. Compaction checkpoints make rollback between turns possible.

03 · THE TOOL SYSTEMRegistration, invocation and error protocol

Tools are OpenClaw’s primitive actions. Read a file, run a bash command, open a URL, send a message.

Registration. Tools are discovered at gateway start: built in tools (file, bash, web, memory) plus tools brought by active skills. Per tool: a name, a description (LLM readable), an input schema (JSON Schema), and a handler.

Invocation protocol. The agent proposes a list of available tools to the LLM; the LLM picks a tool plus arguments; the runtime validates against the schema; the handler runs. Output is structured: {ok: true, result} or {ok: false, error}. Errors are marked so the LLM can distinguish them from legitimate content.

Partial failure. When a tool partially succeeds (reading three files of which one fails), the result payload contains both successes and failures. The LLM can then decide whether to go on with what did work.

Tool policies and gating. Before a tool is invoked, the runtime runs a policy check: is this tool allowed in the current profile? Does it require explicit user approval? Which argument validations apply? The tools.profile, tools.deny, exec.security, exec.ask settings act here.

Tool call result shape. A defined response shape per tool: {summary, details, attachments}. The LLM gets the summary field directly; details can be retrieved through a follow up question. That keeps the context window manageable when tool outputs are large.

04 · THE SKILL SYSTEMBundles of instruction plus tools plus scripts

Skills are reusable abilities. A skill is a folder with a SKILL.md (instruction plus metadata) and optionally scripts.

Discovery. At gateway start the runtime scans the workspace skills folder (~/.openclaw/workspace/skills/) plus built in skills. Per skill, SKILL.md is parsed: frontmatter with name, description, requires (env vars, external binaries), policy. A skill that cannot find its requires is marked ineligible.

Two execution paths. Tool dispatch for skills that simply point to a built in tool (fast, deterministic, no LLM needed). Model invocation for skills that require contextual decisions (the agent reads the skill instructions, thinks, uses multiple tools as needed).

Parameter bindings. A skill can declare input parameters. When the agent calls the skill, it fills in the parameters based on the conversation context. The runtime validates the parameters against the schema before the handler runs.

Skill policy validation. Before a skill runs, the runtime checks whether the profile allows it. Skill policy validation happens client side before execution goes to the server; that gives faster feedback than waiting for a server rejection.

Hot reload. Changes in skill files are picked up within about 250ms; no gateway restart needed. That makes iterative skill development pleasant.

The claude code proxy skill. A special skill that makes Claude Code conversationally available via messaging channels. Budget capped and channel aware. Output is formatted for the channel you are working in. An example of a skill that itself calls a sub LLM internally.

05 · HOOK ARCHITECTUREEvent handlers in the runtime lifecycle

Hooks are handlers on gateway events. Unlike skills, which the agent chooses, hooks run automatically when their event fires.

Hook points. The fieldguide lists seven primary events: message:received, message:sent, command:new, session:start, agent:bootstrap, agent:end, gateway:start. For each hook point the runtime supplies an event object with context (channel, peer, content, timing).

Hook signature. A hook is a TypeScript file (handler.ts) with a default export function: async function(event). Returns undefined or a replacement event shape (for message transformations).

Hook failure isolation. When a hook throws, the runtime catches the exception. The event loop continues. Logs record the error. A failing hook does not block messages.

Trusted versus untrusted hooks. Events from wake triggers (Gmail Pub/Sub, external webhooks) are marked as untrusted. Tools that only open for trusted content stay closed. This prevents an attacker who can fire a webhook from injecting trusted context.

Built in webhook mappings. For Gmail, OpenClaw provides a setup helper: openclaw webhooks gmail setup. Under the hood that configures the Pub/Sub topic, push subscription, and the mapping in hooks.mappings of openclaw.json. The canonical config key is hooks.internal.entries; hooks.internal.handlers is only kept for compatibility input.

06 · HEARTBEAT IMPLEMENTATIONHow agent life is simulated

The heartbeat loop wakes the claw periodically. An implementation detail that makes the living agent illusion possible.

Timer and wake up. An interval per agent (every: "30m"), with activeHours that respect quiet times. At the right moment an internal timer fires a wake event to the runtime. The runtime loads the agent state, mounts HEARTBEAT.md as a system extension, and starts a special heartbeat turn.

HEARTBEAT_OK silence. If the agent sees no reason to do or say anything, it replies with HEARTBEAT_OK. The runtime parses that as “no action needed”, stays silent to the user, and only logs that the heartbeat was harmless. That prevents notification spam.

Target routing. A heartbeat outcome (a message, a reminder) goes somewhere. target: "last" sends to this agent’s last used channel; specific channel ids send there; target: "none" means the result only lands in logs.

Model choice per heartbeat. For heartbeat work a lighter (and cheaper) model is often used: a separate model on the heartbeat config overrides the main choice. This saves cost for the routine work the agent does in its own time.

Heartbeat system prompt section. By default a heartbeat instruction section is included in the system prompt. For agents without a heartbeat loop, includeSystemPromptSection: false can omit this section, which saves tokens.

07 · CRON AND REPEATING SKILLSInternal scheduling versus external schedulers

OpenClaw has an internal cron system. Why internal, and how does it differ from an external scheduler?

Why internal. An external scheduler does not know which session belongs to it, which skills are relevant, which model may be used, which fallback paths exist. By modelling cron internally, a scheduled job can have the same richness as a manual conversation.

Job types. --session main runs the cron job in the agent’s main session. --session isolated starts a new isolated session without prior context. --session-key "agent:<id>:<channel>:<peer>" runs in a specific existing session; useful for recurring tasks that must stay inside an ongoing thread.

Trigger and delivery. A cron job can send a --system-event (an instruction the agent reads as system context) or a --message (an instruction the agent reads as a user message). The difference is intent: system event is “do something silently”, message is “as if I asked it myself”.

Error handling. On transient errors (rate limits, timeouts, 5xx) the system retries with exponential backoff: 30s, 60s, 5min, 15min, 60min, then a failure alert via a configured channel.

Idempotency. Cron jobs that cause side effects themselves (sending mail, adding a post) must be idempotent or explicitly retry safe by design. A job that would do a duplicate action on re execution is deliberate design work.

08 · MEMORY AND SOULTheir implementation and lifecycles

Soul and Memory are architecturally separated. At implementation level:

SOUL.md. A markdown file in the workspace. Mounted unchanged into the system prompt at each turn. No auto updates by the agent; only manual or via an explicit tool call. Truncation above a threshold.

MEMORY.md. Also in the workspace, also in the system prompt, max 20,000 characters. The agent can edit this file itself via a tool call. During compaction, important facts are summarised and stored here.

Daily notes. memory/YYYY-MM-DD.md files, not in the prompt automatically. Accessible via the memory_search tool. The agent writes facts here as conversations happen; on /reset or compaction the current session is summarised into a daily note before being cleared.

Vector index. Optional: a SQLite or QMD index over all daily notes, with embeddings from a configured provider (OpenAI, Gemini, local via Llama). memory_search queries combine vector similarity with text match and time decay.

Multimodal memory. When multimodal.enabled is on, images and audio can also be indexed. On retrieval this provides cross modal context.

Memory core lifecycle. Sleep phases, REM preview, grounded REM backfill. Memory core does not write directly to disk; it goes through a staged processing of which REM preview can be inspected before persistence. Grounded REM backfill can return historic daily notes into active memory when they turn out to be relevant.

Memory wiki and active memory. Two extensions on top of memory core. Wiki pages with backlinks between related notes; ingest/compile/lint pipeline. Active memory is a blocking sub agent that for every reply automatically retrieves relevant memories and injects them, bounded by a timeout.

09 · WORKSPACE ISOLATIONHow Docker sandboxing actually works

Docker sandboxing isolates tool execution. The implementation:

Image. Default node:22-slim. Adjustable via agents.defaults.sandbox.docker.image. The image should be minimal; no extra tooling unless the tools running in the sandbox need it.

Mount strategy. The workspace is mounted read only. The claw can read its own context but cannot rewrite its personality from a tool. Filesystem outside the workspace is not mounted; tools that try to read outside get “no such file”.

Network. Default network: "none". Tools that have to go on the web (web_fetch, web_search) route their traffic through the gateway, not via the container. That gives the gateway control over outbound traffic.

Capabilities. capDrop: ["ALL"]. All Linux capabilities removed. A process in the container has minimal system privileges, even if it has a root flag somewhere.

Memory and pids limit. memory: "512m" and pidsLimit: 100. Soft upper bounds preventing a runaway tool from eating the host. For heavy data processing these need to be raised.

Cold start. The first time is a few seconds slower because the image has to be loaded. Subsequent invocations within hundreds of milliseconds.

Mode choice. off, non-main (subagents in sandbox, main agent on host), all (everything in sandbox). For public bots all is the right choice; for personal use non-main is the pragmatic middle ground.

10 · RECURRING PATTERNSCode idioms that characterise OpenClaw

A few patterns you encounter often in the OpenClaw codebase that help define its character.

Registry pattern for pluggable components
Channels, tools, skills and LLM providers are all pluggable. At gateway start a registry is built by discovery; runtime lookups go through that registry. Adding a new channel implementation means a new entry in the registry, not a change in the core routing.
Configuration hierarchy
Settings cascade from defaults to agent level to channel level to call level. `agents.defaults.model` is overridable by `agents.list[i].model`, which is overridable by `channel.model`, which is overridable by per call flags on the CLI. The lookup is consistent across the entire codebase.
Event emitter for inter component communication
Channels, agents and hooks talk to each other through a central event emitter. A new incoming message is an event, an agent response is an event, a hook trigger is an event. Loosely coupled, easy to extend, observable.
Atomic state writes
Sessions and memory files are written via temp file plus atomic rename. A crash during a write gives you either the old or the new version, never a partial write. For MEMORY.md and `memory/YYYY-MM-DD.md` this is an important guarantee.
Provider abstraction via uniform interface
Anthropic, OpenAI, Gemini, Ollama, Bedrock, and the rest each have their own request/response shape. OpenClaw implements an adapter per provider that maps to a shared `ChatCompletion` interface. The fallback chain and multi key rotation work on top of this shared interface.

These patterns together do not form an exotic architecture. They are solid Node engineering applied to a specific problem. What characterises OpenClaw is not brilliance per pattern, but the consistent application of well known patterns to a well defined domain.

For anyone who wants to dive deeper here: read the source. The codebase is open and the fieldguide sections point in several places to specific modules. Start at the gateway, work outwards: channels, agents, tools, skills, hooks. The mental models from this page make the code easier to read afterwards.