Operator Guide

For operators deploying ZeroDDS to production. Seven sections covering the full life-cycle.

Production Deployment

Three deployment modes:

Linux systemd — .deb / .rpm ship pre-built unit files. systemctl enable zerodds-<bridge>.
macOS launchd — Homebrew installs org.zerodds.<bridge>.plist. brew services start zerodds.
Windows Service — WiX MSI registers each daemon via sc.exe. Run Install-Services.ps1 as administrator.

Configuration Files

Each daemon reads /etc/zerodds/<daemon>.yaml. Schema:

listen: 0.0.0.0:8080
domain: 0
tls:
  cert: /etc/zerodds/tls/server.crt
  key: /etc/zerodds/tls/server.key
  client_ca: /etc/zerodds/tls/clients.crt   # optional, mTLS
auth:
  mode: bearer                                # none | bearer | jwt | mtls | sasl-plain
  tokens: /etc/zerodds/tokens.txt
acl:
  default: deny
  rules:
    - { subject: "user:alice", op: read,  topic: "Sensors/*" }
    - { subject: "*group:editors*", op: write, topic: "Commands/#" }
metrics:
  enabled: true
  address: 127.0.0.1:9090

Monitoring

Built-in observability:

Prometheus — /metrics on :9090 per daemon. frames_in/out_total, bytes_in/out_total, connections_active, dds_samples_in/out_total, errors_total.
OpenTelemetry — OTLP/HTTP/JSON exporter via zerodds-observability-otlp. Standard histograms: dds.write.latency, dds.read.latency, dds.heartbeat.rtt, dds.discovery.match.duration.
Catalog — /catalog JSON: service name, version, topics + QoS profiles.
Health — /healthz returns 200 OK while ready, 503 during shutdown.

Backup & Recovery

Use the zerodds-recorder-bridge daemon to capture topics to .zddsrec files. zerodds-replay reads recordings and republishes at scaled wallclock for disaster recovery, demo replay, or regression testing. State that survives is in the recordings; daemons themselves are stateless beyond TLS keys.

Security Hardening

Pin TLS to TLS 1.3 only (rustls 0.23 default).
Use mTLS for daemon-to-daemon links; pure bearer is acceptable for browser clients.
Default ACL to deny; whitelist explicit rules.
Rotate TLS certificates via SIGHUP (RotatingTlsConfig reloads without dropping connections).
Run daemons under unprivileged users with systemd ProtectSystem=strict, NoNewPrivileges=true.
Enable DDS-Security plugins via Participant QoS for end-to-end RTPS-AAD.

Capacity Planning

Two questions to answer before sizing: how much data per second per process, and what is the bottleneck when scaling beyond that. The numbers below are reference points from the in-tree benchmark suite — single-process, no GC pause, on the llvm bench host (AMD Ryzen Threadripper PRO 3955WX, 24 cores, Linux 6.1 vanilla). Use them as upper-bound guidance: your application's serialisation, QoS profile and broker configuration will dominate well before the bridge does.

Path	Reference rate	Bottleneck when scaling
DDS over RTPS UDP (LAN)	~4 GiB/s payload	NIC, kernel UDP buffer
DDS over shared memory	< 5 µs roundtrip	cache-line traffic, polling vs. eventfd wake-up
WebSocket bridge	~250k frames / s	TLS + per-frame allocation
MQTT bridge	~100k messages / s	upstream broker (mosquitto / HiveMQ)
gRPC bridge	~50k unary calls / s	HTTP/2 connection & HPACK table churn
AMQP bridge	~100k messages / s	broker (RabbitMQ) flow-control credits

Sizing guidance

Scale horizontally first — a single bridge daemon saturates roughly one core's worth of TLS plus framing. Run one daemon per protocol per node, not one giant daemon for everything.
QoS dominates — Reliable + KeepAll + small history depth produces back-pressure long before bandwidth runs out. Pick KeepLast(N) with N matching your reader's worst-case lag.
Memory budget per participant — about 2 MB plus history-cache (sample size × N × instances). Plan for 100–500 participants per host comfortably.
Discovery cost — SPDP traffic is O(participants²) on the multicast group. Above a few hundred participants, partition by domain ID or use a unicast discovery peer-list.

Upgrade Path

Within 1.x: drop-in replacement, no QoS changes required, wire stays RTPS 2.5. Across major versions: see the migration note in the corresponding release.

Full handbook on GitHub →