Hetzner + Cloudflare Tunnel for OpenClaw: Hardened Reference Architecture for SetupClaw

A practical reference architecture for OpenClaw on Hetzner with Cloudflare Tunnel: trust-zone route separation, fail-closed defaults, layered auth, and rollback-first operations.

Abstract: A production OpenClaw setup on Hetzner is typically safer when Cloudflare Tunnel is used as controlled ingress, not as a blanket exposure shortcut. This guide outlines a hardened reference architecture that SetupClaw can deliver and customers can actually operate: trust-zone route separation, explicit DNS design, layered auth, private fallback access, and rollback-first operations.

The biggest mistake in tunnel-based deployments is simple. Teams expose everything through one route because it looks convenient, then discover too late that they cannot explain what is public, what is protected, and how to roll back safely.

A hardened SetupClaw pattern starts from the opposite direction. Keep high-privilege OpenClaw control paths private-first. Publish only the minimum routes needed for external workflows. Treat every route as a policy decision with an owner.

This sounds stricter, but it makes operations calmer. You get fewer surprises and faster recovery when something goes wrong.

Start with trust zones, not tunnel commands

Before touching DNS or Cloudflare settings, define trust zones.

A practical baseline has at least two zones. First, an operator control zone for high-privilege actions. Second, an integration ingress zone for narrowly scoped external delivery. These zones should not share the same exposure and policy assumptions.

Once zones are clear, routing becomes design, not guesswork.

Keep operator control private-first

OpenClaw’s most privileged control surfaces should remain private where possible. Use SSH or Tailscale patterns for break-glass and administrative access.

This gives you a recovery path even if tunnel policy or DNS changes misbehave. It also reduces pressure to weaken security controls during incidents.

Tunnel convenience should never remove private recovery paths.

Separate public routes by purpose

Each public route should have a single purpose and explicit upstream target.

For example, ops.example.com can be reserved for high-trust operator routes while ingress.example.com stays narrow for integration-style ingress.

Do not mix operator UI and integration-style ingress behind one permissive rule. Do not use broad wildcard forwarding that can expose unintended endpoints. Unknown paths should fail closed.

Route separation is not bureaucracy. It is the most effective way to reduce blast radius.

DNS strategy is part of security

Use dedicated hostnames per function and avoid catch-all records.

Set rollback-friendly TTLs and document change sequencing so reversions are predictable. DNS choices directly affect incident duration, especially when route changes need to be undone quickly.

If DNS planning is skipped, rollback becomes slower than it needs to be.

Cloudflare transport does not replace OpenClaw auth

A secure edge does not remove application-layer risk.

Gateway auth controls still need to be enforced. Telegram policies, allowlists, mention-gating, and route-specific behaviour, should remain strict. Tunneling traffic safely is not the same as authorising actions safely.

You need both layers active at all times.

Keep Hetzner host posture aligned with architecture

Tunnel adoption should not trigger broad host exposure.

Maintain least-exposed host assumptions, align firewall posture with intended routes, and avoid “temporary” port openings during debugging. Temporary exposure often becomes permanent by accident.

Host hardening and ingress hardening should reinforce each other.

Verification before cutover should be mandatory

Do not cut over based on tunnel status alone. Run a full verification checklist:

expected hostnames resolve correctly
expected routes are reachable
unexpected routes are denied
TLS/origin trust validates correctly on intended routes
auth is enforced where required
Telegram control flow remains policy-correct
private fallback access remains available

If any check fails, treat rollout as incomplete.

Plan for real failure modes, not ideal ones

A hardened reference architecture needs predefined responses for common failure classes:

tunnel healthy but origin unhealthy
policy mismatch after route edit
overlap between routes exposing wrong path
certificate or host mismatch behaviour

Each failure mode should have command-level checks and a named owner. This is where most production reliability gains come from.

Cron incidents and tunnel incidents should be diagnosed separately

Tunnel issues can disrupt ingress and external delivery. They do not automatically mean cron is broken.

Cron runs in the Gateway process, so include scheduler smoke checks after incidents, but avoid misclassifying edge failures as scheduler failures. Separating diagnosis reduces unnecessary restarts and confusion.

Use PR-reviewed workflows for route and policy changes

Network and route changes are production changes. Treat them with PR review, not manual edits in high-pressure windows.

PR-only discipline improves auditability, makes rollback easier, and reduces outage-causing drift from ad hoc changes.

Infrastructure reliability improves when infra changes are reviewable.

Practical implementation steps

Step one: define topology and trust zones

Create a simple architecture doc listing zones, route intent, and private fallback paths.

Step two: build route inventory table

For each route, record hostname/path, upstream target, access policy, owner, and rollback step.

Step three: implement DNS and tunnel config with fail-closed defaults

Publish only required routes and ensure unspecified paths are denied.

Step four: validate layered auth and Telegram policy

Confirm tunnel access does not bypass Gateway auth or channel-level restrictions.

Step five: run pre-cutover verification suite

Test route behaviour, auth enforcement, fallback access, and channel continuity end-to-end.

Step six: run rollback drill before production sign-off

Execute a controlled rollback using the documented sequence and verify full recovery.

Rollback pass criteria should be explicit: expected routes restored, unexpected paths denied, auth still enforced, Telegram control flow validated, and cron smoke test passes.

A reference architecture like this is why SetupClaw Basic Setup can stay practical and safe at the same time. It gives customers a clear boundary model they can operate after handoff, rather than a one-off configuration that only works while the original installer is available.