When should OpenClaw Gateway in a Pod use bind=lan versus binding only to 127.0.0.1?

Use loopback when only in-pod sidecars or same-network-namespace peers must reach the listener. When traffic arrives via Service, Ingress, or NodePort from other Pods or nodes, use bind=lan (or equivalent 0.0.0.0 / dual-stack listen); kube-proxy forwarded traffic cannot reach sockets bound only to loopback.

What are common false positives when accepting with kubectl port-forward?

port-forward validates only the path from your kubectl client to the target Pod port—not in-cluster DNS, NetworkPolicy, Ingress, or mTLS end-to-end. Long soak tests are sensitive to local kubeconfig and client disconnects; keep smoke checks separate from SLO probes.

For OOMKilled, should you raise memory limits first or cut concurrency?

Start with kubectl describe pod for Last State and cgroup limits, then the gateway working-set curve. If limits sit clearly below P95 resident memory plus peak buffer, raise requests/limits with a pinned image and retest. If spikes correlate with tool-call storms, throttle, batch, or scale replicas before raising limits indefinitely and masking leaks.

Readiness is NotReady but logs show the process started—how do you triage?

Verify the probe hits the correct port and path on the Pod IP, that startPeriod or startupProbe covers cold start, that you did not probe 127.0.0.1 while the app uses bind=lan, and that NetworkPolicy allows kubelet probe sources. Temporarily align readiness with a minimal TCP probe like liveness to isolate HTTP routing issues.

What extra items should a sign-off checklist align when contrasting remote physical Mac bare metal?

Align launchd single-instance semantics and log paths, Screen Sharing and signing permissions missing in container paths, host timezone and power policy effects on scheduled jobs, and Docker Desktop/Colima vs pure Kubernetes network namespace differences—so the same OpenClaw version yields equivalent health signals on both delivery tracks.

Deployment Guide 2026-04-08 14 min

2026 OpenClaw Gateway Kubernetes Deployment & Acceptance Runbook: Version Pinning, Resource Quotas, bind=lan, port-forward, and Typical OOM/NotReady Rollback (FAQ + Remote Physical Mac Bare-Metal Contrast)

Platform and SRE teams shipping OpenClaw Gateway on Kubernetes often stall sign-off on image drift, misaligned probes and bind addresses, port-forward smoke tests that do not match real traffic, and whether to roll back or retune on OOM/NotReady. This article provides a scannable Kubernetes vs remote physical Mac matrix, a seven-step runbook, change-ticket-ready thresholds, and a symptom-based FAQ.

2026 OpenClaw Gateway Kubernetes deployment and acceptance runbook

1. Introduction: why gateway acceptance on Kubernetes needs network and cgroup evidence

On bare metal, “the port is up” often equals a successful bind. On Kubernetes, the same log line still passes through Service endpoints, kube-proxy (or your CNI datapath), NetworkPolicy, and cgroup memory accounting. If OpenClaw Gateway keeps a 127.0.0.1 mental model from a laptop, you get false negatives: curl into the Pod works while traffic via Service fails, or readiness stays red while the process is alive.

This guide chains evidence you can sign: image digest and Helm values hash, requests/limits aligned with OOM events, bind=lan consistent with targetPort, and port-forward smoke tests cross-checked with in-cluster probes. If you are also evaluating macOS node placement for global latency, use this matrix as the parent template for “same version, two tracks” acceptance.

2. Three pain points: version drift, bind vs probes, quotas and noisy traffic

Version drift and irreproducibility: Production uses :latest or tags without digests; two weeks later the same tag rebuilds with different behavior. Rollbacks cannot prove the old ReplicaSet matches the incident binary.
Bind address, Service, and probes in conflict: The gateway listens on loopback while readiness hits the Pod IP; or bind=lan is correct but NetworkPolicy only allows the Ingress CIDR and kubelet probes are dropped—NotReady coexists with traffic blackouts.
Resource quotas and hidden cost: Missing requests schedules “successfully” until the node packs and delayed OOM follows; tiny limits kill the process at tool-call peaks with exit 137 and little in logs—hard to separate leaks from normal spikes without metrics.

3. Decision matrix: Kubernetes vs remote physical Mac bare metal

Align “who wins under which constraint” so launchd habits are not pasted verbatim into Pods.

Dimension	Kubernetes (Deployment + Service)	Remote physical Mac / launchd
Listen bind	Use `bind=lan` (or 0.0.0.0) so Service can reach the Pod; loopback only for same-Pod sidecars	Often `127.0.0.1` behind nginx/Caddy terminating TLS
Version pinning	Image digest + chart version + values hash in the change record	Checksum + lockfiles + launchd plist version fields
Isolation	cgroup OOMKilled and CPU throttle are auditable	Unified memory and swap policy; watch memory pressure and thermal throttling
Ad-hoc acceptance	`kubectl port-forward` for smoke tests—not a substitute for in-cluster paths	Local `curl` or SSH tunnels; shorter path, fewer replica angles
Typical rollback	`kubectl rollout undo` or pinned previous digest	Replace binary/image tag + `launchctl kickstart -k`; mind single-instance locks

4. Seven-step runbook (bind=lan and port-forward)

Pin versions: CI writes image repo@sha256:…, Helm chart version, and values.yaml git SHA into the ticket; block production pipelines on floating tags.
Declare resources: Set requests.memory near P95 resident working set; limits.memory covers tool calls and JSON buffers; CPU requests avoid scheduling onto already saturated nodes.
Align bind and ports: If traffic enters via Service/Ingress, configure bind=lan (or documented dual-stack listen) and verify containerPort, targetPort, and probe ports match.
Configure probes: Readiness uses the same protocol/host/path tuple as real traffic; add initialDelaySeconds / startupProbe for cold starts so skills loading does not flip NotReady.
port-forward smoke test: From an ops machine run kubectl port-forward deploy/openclaw-gateway 18789:18789 (replace port), complete minimal health and one tool call; repeat an in-cluster probe and record whether both paths agree.
Observe and alert: Tie restart count, OOMKilled, readiness=false duration, 5xx, and gateway queue depth to one dashboard; keep 24h before/after change windows.
Bare-metal contrast sign-off: On a remote physical Mac, repeat key health signals with the same digest using native install or Compose, and document deltas—see 2026 OpenClaw on Windows and Linux: PowerShell vs WSL2, Enterprise HTTPS Proxy, Node Pinning, Remote macOS Gateway Runbook for client-to-macOS gateway alignment.

5. Cite-ready thresholds and parameters

Image pinning: Production tickets include digest and build id; rollbacks cross-check incident timestamps.
Memory headroom: With observed tool-call spikes, keep limits at least ~25–40% above explainable P95 resident; prefer throttling before blind doubling.
Probe startup: Gateways with >30s cold start should use startupProbe or ≥40–60s grace, aligned with OpenClaw workspace/skills load time.
port-forward: Smoke only; sign-off SLOs must include in-cluster Service DNS and Ingress/TLS paths.

6. FAQ: OOM, NotReady, and rollback

Does bind=lan widen exposure?

Listening on a non-loopback address inside the Pod does not equal public Internet exposure; surface area is defined by Service type, Ingress, NetworkPolicy, and egress policy. Audit “process bind” and “who can route to the Pod” separately.

First step on OOMKilled?

Read kubectl describe pod Last State, node memory pressure, and exit 137; correlate with concurrent tool calls and payload sizes to avoid mistaking spikes for leaks.

NotReady but logs say listening—what to fix first?

Check probe URL port/path, missing startupProbe, NetworkPolicy blocking kubelet sources, and whether HTTP routes mount only after ready.

How to document rollback for compliance?

Keep the previous digest and Helm revision; after kubectl rollout undo, run in-cluster health and minimal business handshake and attach evidence to the ticket.

7. Why aligning the same gateway version on Mac mini is easier

Kubernetes covers replicas, rolling updates, and quota audits; remote physical Mac nodes—often Mac mini M4—remain the practical default for signing, Screen Sharing, and Apple-ecosystem integration. Running the same digest under launchd before promoting the cluster image reduces late-stage bind and probe surprises.

On macOS, Unix tooling and SSH work out of the box; Apple Silicon unified memory keeps long-lived gateway processes stable versus many small PCs, and ~4W idle-class power makes 7×24 contrast tests affordable. Gatekeeper, SIP, and FileVault reduce malware risk versus typical Windows fleet images. If you want equivalent health signals on cluster and bare metal, a quiet, efficient Mac mini M4 is a strong reference node.

If you are ready to validate this runbook on real hardware, Mac mini M4 is a cost-effective standard contrast node to run in parallel with production clusters.

Limited Time Offer

Use Mac mini as your off-cluster “golden” contrast node

Sign off the same gateway build on remote macOS first, then promote the Kubernetes image—fewer probe and bind incidents.

Pay-as-you-go Physical nodes Low idle power