Configuration
Reference and diagnostic hub for environment variables and on-disk paths: database URLs, ENGINE_URL, LLM credentials, *_ROOT artifact directories, worker tuning, recovery timeouts, and OpenTelemetry. Use it after the demo arc when wiring production-like settings, and when a demo fails because .env, *_ROOT, or host vs container paths do not line up — symptom tables on each demo page and Troubleshooting are the first stop; this page is the full env catalog and path map.
Copy .env.example to .env at the repo root and set the variables in the tables below. Process environment variables override .env values.
Runtime context
Warden splits work across processes that may not share the same filesystem view. Under make up, engine and worker run in containers while you typically run the CLI on the host.
Rule of thumb: The CLI needs access to manifest YAML (paths you pass to warden deploy -f …). The engine and worker need access to prompts, policies, schemas, and compensations via *_ROOT inside their process environment—not your host ./config/... paths unless Compose mounts those files at the container roots below.
Dev stack (make up)
Engine and worker run in containers; you run warden on the host.
| Setting | Host (CLI) | Engine / worker (containers) |
|---|---|---|
POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD | .env — read by Compose postgres service | same values bootstrap the DB on first volume init |
DB_URL | postgres://...@127.0.0.1:5432/engine_db (host CLI, make migrate) | Built by Compose → ...@postgres:5432/... (engine + worker) |
ENGINE_URL | http://127.0.0.1:8000 (CLI only) | not set — engine/worker use Postgres, not HTTP to each other |
PROMPTS_ROOT | Leave unset in .env | /app/prompts (set in docker-compose.yml) |
POLICIES_ROOT | Leave unset in .env | /app/policies |
SCHEMAS_ROOT | Leave unset in .env | /app/schemas |
COMPENSATIONS_ROOT | Leave unset in .env | /app/compensations |
Compose mounts your repo's ./config/ tree into each container. The host path and container path are different names for the same files:
| Repo path (host disk) | Mount inside container | Read by |
|---|---|---|
./config/prompts/ | /app/prompts | engine (register + validate), worker (execute) |
./config/policies/ | /app/policies | engine |
./config/schemas/ | /app/schemas | engine |
./config/compensations/ | /app/compensations | engine |
.env for ComposeIf you set PROMPTS_ROOT=./config/prompts in .env, Compose injects that value into containers via env_file — but ./config/prompts does not exist inside the container filesystem. The compose file already sets PROMPTS_ROOT=/app/prompts and mounts ./config/prompts there. Leave PROMPTS_ROOT, POLICIES_ROOT, SCHEMAS_ROOT, and COMPENSATIONS_ROOT unset in .env for standard make up workflows so containers use those compose-defined paths instead of a host-relative path that does not resolve inside the image.
warden deploy -f config/saga.minimal.yaml reads YAML from your host working tree. After deploy, manifest bodies live in Postgres; prompt, policy, schema, and compensation files stay on disk and must be visible at the container *_ROOT paths above.
Minimal template (docker-compose.example.yml)
Same host DB_URL, in-compose postgres:5432, and container *_ROOT paths as the dev stack. See Installation → Local development stack.
Variable quick reference
| Variable | Host CLI | In containers |
|---|---|---|
POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD | .env for Compose substitution | postgres service bootstrap |
DB_URL | 127.0.0.1:5432 | postgres:5432 (engine + worker) |
PROMPTS_ROOT | leave unset in .env | /app/prompts |
POLICIES_ROOT | leave unset in .env | /app/policies |
SCHEMAS_ROOT | leave unset in .env | /app/schemas |
COMPENSATIONS_ROOT | leave unset in .env | /app/compensations |
ENGINE_URL | http://127.0.0.1:8000 (CLI only) | not set |
Database (Compose)
Docker Compose uses two related sets of database variables:
| Variables | Read by | Purpose |
|---|---|---|
POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD | postgres service | Bootstrap the database and role on first container start |
DB_URL in .env | Host CLI; optional make migrate on the host | Reach Postgres via the published port 127.0.0.1:5432 |
Compose overrides DB_URL for engine, worker, and migrate — those containers connect to postgres:5432 on the internal network. You do not set their DB_URL in .env.
Both the engine and the worker require DB_URL. The outbox is Postgres-native: the engine writes commands and ingests results; the worker claims work and reports STEP_COMPLETED / STEP_FAILED. Each process must reach the same database—host CLI tools use 127.0.0.1:5432; in Compose, engine and worker use the internal postgres:5432 hostname.
Keep POSTGRES_* and host DB_URL aligned (same user, password, and database name). You can use any values — admin / password / engine_db are the repo defaults, not a hard requirement:
POSTGRES_DB=engine_db
POSTGRES_USER=admin
POSTGRES_PASSWORD=password
DB_URL=postgres://admin:password@127.0.0.1:5432/engine_db
POSTGRES_* is applied only when the Postgres data volume is created. Changing credentials in .env later does not update an existing volume — run make reset or alter the role manually.
Use 127.0.0.1 in host DB_URL, not postgres — the service hostname resolves only inside Compose.
Dev stack (Makefile and ports)
When you use make up, Postgres data persists in the Docker named volume engine_db_data. Stopping containers keeps that volume; deleting it gives you an empty database on the next start.
Make targets
| Target | Effect |
|---|---|
make up | Start dev compose (db, migrate, engine, worker, jaeger, adminer) |
make up-db | Postgres only |
make stop / make down | Stop containers; keep engine_db_data |
make clean | Stop and delete engine_db_data |
make reset | make clean then make up |
make migrate | Apply migrations from the host (DB_URL → 127.0.0.1:5432) |
make doctor | Service status + recent migrate, engine, and worker logs |
You normally do not run make migrate for a first boot — the one-shot migrate service runs before engine and worker start. Use host-side make migrate when Postgres is up but you are not using the full compose migrate container. If the engine or worker exits with Database schema is not initialized, migrations did not complete — see Troubleshooting.
Default ports (dev stack)
| Service | URL |
|---|---|
| Engine | http://127.0.0.1:8000 |
| Postgres (from host) | 127.0.0.1:5432 |
| Adminer | http://127.0.0.1:8080 |
| Jaeger | http://127.0.0.1:16686 |
Environment variables
| Variable | Consumer | Notes |
|---|---|---|
DB_URL | engine, worker, migrations | Both engine and worker need Postgres for the outbox loop. Host: 127.0.0.1:5432. Compose containers: built from POSTGRES_* → postgres:5432 |
POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD | postgres service (Compose) | Must match credentials in host DB_URL |
ENGINE_URL | host CLI only | http://127.0.0.1:8000 from your machine (published port). Engine and worker containers do not use this variable—they coordinate through Postgres. If you run the CLI inside the Compose network, use http://engine:8000 instead of loopback |
OPENAI_API_KEY | worker | when provider: openai — not required for provider: local or provider: mock. Checked at step runtime, not at deploy |
WARDEN_LOCAL_LLM_BASE_URL | worker | OpenAI-compatible local endpoint (optional; defaults to http://localhost:11434/v1) |
GITHUB_PERSONAL_ACCESS_TOKEN | worker | GitHub MCP demo only (stdio env_inherit) |
${ENV:…} in worker manifest | worker | SSE MCP auth — names like COMPANY_MCP_TOKEN referenced in tool_sources[].headers; set on worker, not in YAML |
Under Compose, .env is injected when a container starts. After you add or change worker-scoped variables (OPENAI_API_KEY, WARDEN_LOCAL_LLM_BASE_URL, MCP tokens), restart the worker — not the engine: docker compose up -d worker. A step that already failed with missing credentials stays failed until you retry it or start a new saga instance.
For hosted SSE MCP, declare headers in the worker manifest with ${ENV:VAR} placeholders and set VAR on the worker process. Example worker manifest:
headers:
Authorization: "Bearer ${ENV:COMPANY_MCP_TOKEN}"
# .env (worker container via env_file)
COMPANY_MCP_TOKEN=your-bearer-token
See MCP and tools → Hosted MCP authentication and Worker manifests.
A minimal .env for Compose + host CLI:
POSTGRES_DB=engine_db
POSTGRES_USER=admin
POSTGRES_PASSWORD=password
DB_URL=postgres://admin:password@127.0.0.1:5432/engine_db
ENGINE_URL=http://127.0.0.1:8000
OPENAI_API_KEY=sk-...
For the local Quickstart path, set WARDEN_LOCAL_LLM_BASE_URL and deploy config/worker.local-minimal.yaml:
WARDEN_LOCAL_LLM_BASE_URL=http://host.docker.internal:11434/v1
For the mock demo, omit both — use provider: mock manifests.
Local LLM under Docker (Ollama)
Warden does not run Ollama in Compose. You install and start Ollama on the host (systemctl start ollama or the desktop app), pull the model named in your worker manifest (ollama pull llama3.2), then point the worker container at that HTTP API.
Two common gotchas when the worker runs in Docker but Ollama runs on the host:
-
Ollama binds to loopback only — default is
127.0.0.1:11434, which containers cannot reach. SetOLLAMA_HOSTon the Ollama service so it accepts traffic from the Docker bridge, then restart Ollama:sudo mkdir -p /etc/systemd/system/ollama.service.dsudo tee /etc/systemd/system/ollama.service.d/override.conf <<'EOF'[Service]Environment="OLLAMA_HOST=0.0.0.0:11434"EOFsudo systemctl daemon-reloadsudo systemctl restart ollamaConfirm with
ss -tlnp | grep 11434— you want0.0.0.0:11434, not only127.0.0.1. Binding to all interfaces is fine for local dev; restrict port11434with a firewall if the machine is on an untrusted network. -
host.docker.internalon Linux — Docker Desktop defines this hostname automatically; native Linux Docker often does not. The repo'sdocker-compose.yml(anddocker-compose.example.yml) addextra_hostson the worker service:extra_hosts:- "host.docker.internal:host-gateway"Recreate the worker after adding it:
docker compose up -d worker. If you use a custom Compose file withoutextra_hosts, setWARDEN_LOCAL_LLM_BASE_URLto your Docker bridge gateway instead (for examplehttp://172.18.0.1:11434/v1— inspect withdocker inspect <worker-container> --format '{{range .NetworkSettings.Networks}}{{.Gateway}}{{end}}').
Worker manifest: provider: local and model_name matching ollama list — see config/worker.local-minimal.yaml.
Smoke tests:
# Host — Ollama API up
curl http://127.0.0.1:11434/v1/models
# Worker container — DNS + HTTP to host Ollama
docker compose exec worker python -c "
import socket, urllib.request
print(socket.gethostbyname('host.docker.internal'))
print(urllib.request.urlopen('http://host.docker.internal:11434/v1/models', timeout=5).read()[:200])
"
After changing .env or Compose networking, restart the worker (docker compose up -d worker). Failed saga steps stay FAILED until you retry or start a new instance — see Demo: Quickstart → When a step shows FAILED.
Disk artifact roots
Worker and saga manifest YAML is stored in Postgres when you warden deploy. Prompts, policies, output schemas, and compensation files stay on disk and are resolved at runtime using these variables:
| Variable | Resolves | Consumer |
|---|---|---|
PROMPTS_ROOT | prompt: foo.j2 → {root}/foo.j2 | engine (register), worker (execute) |
POLICIES_ROOT | policy: gate.yaml → {root}/gate.yaml (legacy stem gate → {root}/gate.yaml) | engine |
SCHEMAS_ROOT | output_schema: triage.json → {root}/triage.json | engine (register + saga start) |
COMPENSATIONS_ROOT | compensation: disburse_undo.yaml → {root}/disburse_undo.yaml | engine (register + saga start) |
Each value is a path relative to the root — subdirectories are allowed (e.g. policy: teams/marketing/gate.yaml). For policy, prefer an explicit path with extension; stem-only refs without an extension still resolve via {ref}.yaml when the exact path is missing (one deploy-time warning per unique legacy ref).
Repo defaults are ./config/prompts, ./config/policies, ./config/schemas, and ./config/compensations.
Worker tuning
These default to conservative values. Adjust as needed for your workload.
ReAct tool responses can be large. WARDEN_REACT_TOOL_MESSAGE_LIMIT trades token economy against context visibility in the LLM transcript: a lower limit saves context window but clips the tail of tool payloads—the agent may miss data and reason from incomplete context. Set 0 to disable clipping entirely for deep debugging passes. Facts extraction always uses the full tool payload regardless of this limit.
| Variable | Default | Purpose |
|---|---|---|
WORKER_MAX_IN_FLIGHT | 1 | Max concurrent step commands handled by one worker process (outbox consumer semaphore) |
WARDEN_REACT_TOOL_MESSAGE_LIMIT | 8000 | Max characters for tool-role messages in the ReAct LLM transcript; 0 disables clipping |
Throughput and parallelism
Parallelism happens at the worker level, not inside a single saga. The engine still schedules one forward step at a time per saga instance—steps in the same workflow do not run concurrently. Throughput across many saga instances comes from how many worker commands your fleet can execute at once.
Each worker process (typically one workers/main.py per container) polls the worker-commands outbox and runs handlers behind an asyncio semaphore sized by WORKER_MAX_IN_FLIGHT:
| Scale lever | Effect |
|---|---|
Raise WORKER_MAX_IN_FLIGHT on one process | That process may execute multiple step commands concurrently (different sagas, or different steps that are already queued) |
| Run more worker replicas | More processes compete for outbox rows—each replica usually keeps WORKER_MAX_IN_FLIGHT=1 unless you have a reason to combine both |
Example: four worker containers with default WORKER_MAX_IN_FLIGHT=1 can execute up to four step commands at once, usually for four different saga instances. One container with WORKER_MAX_IN_FLIGHT=4 can do the same from a single process, at the cost of more concurrent LLM/MCP load and DB connections in that process.
If you raise WORKER_MAX_IN_FLIGHT materially (for example above 8), size the worker Postgres pool accordingly.
Recovery timeouts
Background loops in the worker and engine processes recover from worker crashes and outbox consumer stalls. Set these timeouts above worst-case step latency (LLM + MCP) for your manifests.
| Variable | Default | Consumer | Purpose |
|---|---|---|---|
WORKER_STALE_CLAIM_SECONDS | 1800 | worker | Reap unfinished processed_commands claims |
WORKER_CLAIM_REAP_INTERVAL_SECONDS | 60 | worker | Claim reap tick interval |
OUTBOX_STALE_IN_PROGRESS_SECONDS | 1800 | engine, worker | Reap outbox rows stuck IN_PROGRESS |
OUTBOX_REAP_INTERVAL_SECONDS | 60 | engine, worker | Outbox reap tick interval |
OUTBOX_REAP_BATCH_SIZE | 20 | engine, worker | Max rows reaped per tick per topic |
Workers use claim_token fencing: superseded handlers log claim_superseded with execution_duration_s instead of emitting duplicate results. Frequent supersession within a few seconds means these timeouts are too low.
Operator recovery (after automation): warden saga retry-step / warden saga retry-compensation. --force on commit steps requires --allow-destructive (duplicate side-effect risk). See Saga recovery.
Observability (OpenTelemetry)
Engine and worker export traces over OTLP gRPC at startup. Warden reads these settings from the environment:
| Variable | Default | Consumer | Purpose |
|---|---|---|---|
OTLP_ENDPOINT | unset | engine, worker | Collector URL (for example http://127.0.0.1:4317 or http://jaeger:4317 in Compose) |
OTLP_INSECURE | true | engine, worker | Plaintext gRPC when true; set false for TLS |
When OTLP_ENDPOINT is unset, the OpenTelemetry SDK may still honor OTEL_EXPORTER_OTLP_ENDPOINT and other standard OTEL_EXPORTER_OTLP_* variables on the process.
With make up, Compose already points engine and worker at Jaeger (OTLP_ENDPOINT=http://jaeger:4317). Open the UI at http://127.0.0.1:16686 and filter traces on attribute saga.id (your Warden trace_id from warden start saga).
For correlation fields, TLS, vendor headers, and debugging workflows, see Observability.
The open-core kernel ships engine, workers, CEL policies, HITL gates, compensation, and Postgres-backed saga state—enough to build and run governed workflows in your own infrastructure.
For compliance-grade capabilities—forensic audit history, extended operational enforcement, and related enterprise features—see Open Core vs Enterprise.
LLM retries (automated backoff)
WARDEN_LLM_RETRY_* variables configure transient LLM API resilience inside the worker — network blips, rate limits, and short provider outages on each ainvoke call (ReAct turns on react steps, the single call on simple steps). The worker retries with exponential backoff and jitter; it does not restart a failed saga step, re-run compensation, or replace operator actions.
| Variable | Default | Purpose |
|---|---|---|
WARDEN_LLM_RETRY_ENABLED | true | Toggle backoff wrapper around LLM calls |
WARDEN_LLM_RETRY_MAX_ATTEMPTS | 3 | Max attempts per LLM call (including the first) |
WARDEN_LLM_RETRY_BASE_DELAY_S | 1.0 | Initial backoff delay (seconds) |
WARDEN_LLM_RETRY_MAX_DELAY_S | 60.0 | Cap on backoff delay before jitter |
| If you need… | Use… |
|---|---|
| Backoff on a transient provider error during execution | WARDEN_LLM_RETRY_* (above) |
| An operator to re-run a paused human-gated step | warden review retry — see HITL review |
Recovery after a forward step is stuck IN_PROGRESS | warden saga retry-step — see Saga recovery |
| Recovery after compensation failed or stalled | warden saga retry-compensation |
| Recovery after the step has already failed (normal path) | Saga FSM and compensation — no env var auto-restarts the step |
What's next
Path mismatches between host and container are the most common deploy failure—start with Runtime context when ./config/... works on the host but registration or execution fails inside Docker. For active errors, see Troubleshooting.
For saga authoring, MCP wiring, and policy design, continue to Guides → Manifests and artifacts. For a lean deployment layout without dev-only services, start from docker-compose.example.yml in the repo root.