Operator Guide
For operators deploying ZeroDDS to production. Seven sections covering the full life-cycle.
Production Deployment
Three deployment modes:
- Linux systemd —
.deb/.rpmship pre-built unit files.systemctl enable zerodds-<bridge>. - macOS launchd — Homebrew installs
org.zerodds.<bridge>.plist.brew services start zerodds. - Windows Service — WiX MSI registers each daemon via
sc.exe. RunInstall-Services.ps1as administrator.
Configuration Files
Each daemon reads /etc/zerodds/<daemon>.yaml. Schema:
listen: 0.0.0.0:8080
domain: 0
tls:
cert: /etc/zerodds/tls/server.crt
key: /etc/zerodds/tls/server.key
client_ca: /etc/zerodds/tls/clients.crt # optional, mTLS
auth:
mode: bearer # none | bearer | jwt | mtls | sasl-plain
tokens: /etc/zerodds/tokens.txt
acl:
default: deny
rules:
- { subject: "user:alice", op: read, topic: "Sensors/*" }
- { subject: "*group:editors*", op: write, topic: "Commands/#" }
metrics:
enabled: true
address: 127.0.0.1:9090
Monitoring
Built-in observability:
- Prometheus —
/metricson:9090per daemon.frames_in/out_total,bytes_in/out_total,connections_active,dds_samples_in/out_total,errors_total. - OpenTelemetry — OTLP/HTTP/JSON exporter via
zerodds-observability-otlp. Standard histograms:dds.write.latency,dds.read.latency,dds.heartbeat.rtt,dds.discovery.match.duration. - Catalog —
/catalogJSON: service name, version, topics + QoS profiles. - Health —
/healthzreturns 200 OK while ready, 503 during shutdown.
Backup & Recovery
Use the zerodds-recorder-bridge daemon to capture topics to .zddsrec files. zerodds-replay reads recordings and republishes at scaled wallclock for disaster recovery, demo replay, or regression testing. State that survives is in the recordings; daemons themselves are stateless beyond TLS keys.
Security Hardening
- Pin TLS to TLS 1.3 only (rustls 0.23 default).
- Use mTLS for daemon-to-daemon links; pure bearer is acceptable for browser clients.
- Default ACL to
deny; whitelist explicit rules. - Rotate TLS certificates via SIGHUP (
RotatingTlsConfigreloads without dropping connections). - Run daemons under unprivileged users with systemd
ProtectSystem=strict,NoNewPrivileges=true. - Enable DDS-Security plugins via Participant QoS for end-to-end RTPS-AAD.
Capacity Planning
Two questions to answer before sizing: how much data per second per process, and what is the bottleneck when scaling beyond that. The numbers below are reference points from the in-tree benchmark suite — single-process, no GC pause, on the llvm bench host (AMD Ryzen Threadripper PRO 3955WX, 24 cores, Linux 6.1 vanilla). Use them as upper-bound guidance: your application's serialisation, QoS profile and broker configuration will dominate well before the bridge does.
| Path | Reference rate | Bottleneck when scaling |
|---|---|---|
| DDS over RTPS UDP (LAN) | ~4 GiB/s payload | NIC, kernel UDP buffer |
| DDS over shared memory | < 5 µs roundtrip | cache-line traffic, polling vs. eventfd wake-up |
| WebSocket bridge | ~250k frames / s | TLS + per-frame allocation |
| MQTT bridge | ~100k messages / s | upstream broker (mosquitto / HiveMQ) |
| gRPC bridge | ~50k unary calls / s | HTTP/2 connection & HPACK table churn |
| AMQP bridge | ~100k messages / s | broker (RabbitMQ) flow-control credits |
Sizing guidance
- Scale horizontally first — a single bridge daemon saturates roughly one core's worth of TLS plus framing. Run one daemon per protocol per node, not one giant daemon for everything.
- QoS dominates — Reliable + KeepAll + small history depth produces back-pressure long before bandwidth runs out. Pick KeepLast(N) with N matching your reader's worst-case lag.
- Memory budget per participant — about 2 MB plus history-cache (sample size × N × instances). Plan for 100–500 participants per host comfortably.
- Discovery cost — SPDP traffic is O(participants²) on the multicast group. Above a few hundred participants, partition by domain ID or use a unicast discovery peer-list.
Upgrade Path
Within 1.x: drop-in replacement, no QoS changes required, wire stays RTPS 2.5. Across major versions: see the migration note in the corresponding release.