Operator Guide

For operators deploying ZeroDDS to production. Seven sections covering the full life-cycle.

Production Deployment

Three deployment modes:

Linux systemd — .deb / .rpm ship pre-built unit files. systemctl enable zerodds-<bridge>.
macOS launchd — Homebrew installs org.zerodds.<bridge>.plist. brew services start zerodds.
Windows Service — WiX MSI registers each daemon via sc.exe. Run Install-Services.ps1 as administrator.

Configuration Files

Each daemon reads /etc/zerodds/<daemon>.yaml. Schema:

listen: 0.0.0.0:8080
domain: 0
tls:
  cert: /etc/zerodds/tls/server.crt
  key: /etc/zerodds/tls/server.key
  client_ca: /etc/zerodds/tls/clients.crt   # optional, mTLS
auth:
  mode: bearer                                # none | bearer | jwt | mtls | sasl-plain
  tokens: /etc/zerodds/tokens.txt
acl:
  default: deny
  rules:
    - { subject: "user:alice", op: read,  topic: "Sensors/*" }
    - { subject: "*group:editors*", op: write, topic: "Commands/#" }
metrics:
  enabled: true
  address: 127.0.0.1:9090

Monitoring

Built-in observability:

Prometheus — /metrics on :9090 per daemon. frames_in/out_total, bytes_in/out_total, connections_active, dds_samples_in/out_total, errors_total.
OpenTelemetry — OTLP/HTTP/JSON exporter via zerodds-observability-otlp. Standard histograms: dds.write.latency, dds.read.latency, dds.heartbeat.rtt, dds.discovery.match.duration.
Catalog — /catalog JSON: service name, version, topics + QoS profiles.
Health — /healthz returns 200 OK while ready, 503 during shutdown.

Backup & Recovery

Use the zerodds-recorder-bridge daemon to capture topics to .zddsrec files. zerodds-replay reads recordings and republishes at scaled wallclock for disaster recovery, demo replay, or regression testing. State that survives is in the recordings; daemons themselves are stateless beyond TLS keys.

Security Hardening

Pin TLS to TLS 1.3 only (rustls 0.23 default).
Use mTLS for daemon-to-daemon links; pure bearer is acceptable for browser clients.
Default ACL to deny; whitelist explicit rules.
Rotate TLS certificates via SIGHUP (RotatingTlsConfig reloads without dropping connections).
Run daemons under unprivileged users with systemd ProtectSystem=strict, NoNewPrivileges=true.
Enable DDS-Security plugins via Participant QoS for end-to-end RTPS-AAD.

Capacity Planning

Two questions to answer before sizing: how much data per second per process, and what is the bottleneck when scaling beyond that. The numbers below are reference points from the in-tree benchmark suite — single-process, no GC pause, on the bench host (AMD Ryzen Threadripper PRO 3955WX, 24 cores, Linux 6.1 vanilla). Use them as upper-bound guidance: your application's serialisation, QoS profile and broker configuration will dominate well before the bridge does.

Path	Reference rate	Bottleneck when scaling
DDS over RTPS UDP (LAN)	~4 GiB/s payload	NIC, kernel UDP buffer
DDS over shared memory	< 5 µs roundtrip	cache-line traffic, polling vs. eventfd wake-up
WebSocket bridge	~250k frames / s	TLS + per-frame allocation
MQTT bridge	~100k messages / s	upstream broker (mosquitto / HiveMQ)
gRPC bridge	~50k unary calls / s	HTTP/2 connection & HPACK table churn
AMQP bridge	~100k messages / s	broker (RabbitMQ) flow-control credits

Sizing guidance

Scale horizontally first — a single bridge daemon saturates roughly one core's worth of TLS plus framing. Run one daemon per protocol per node, not one giant daemon for everything.
QoS dominates — Reliable + KeepAll + small history depth produces back-pressure long before bandwidth runs out. Pick KeepLast(N) with N matching your reader's worst-case lag.
Memory budget per participant — about 2 MB plus history-cache (sample size × N × instances). Plan for 100–500 participants per host comfortably.
Discovery cost — SPDP traffic is O(participants²) on the multicast group. Above a few hundred participants, partition by domain ID or use a unicast discovery peer-list.

Upgrade Path

Within 1.x: drop-in replacement, no QoS changes required, wire stays RTPS 2.5. Across major versions: see the migration note in the corresponding release.

Full handbook on GitHub →

Operator Guide

Für Operator:innen, die ZeroDDS in Produktion betreiben. Sieben Abschnitte über den gesamten Lebenszyklus.

Produktiv-Deployment

Drei Deployment-Modi:

Linux systemd — .deb / .rpm liefern vorgebaute Unit-Files. systemctl enable zerodds-<bridge>.
macOS launchd — Homebrew installiert org.zerodds.<bridge>.plist. brew services start zerodds.
Windows-Service — das WiX-MSI registriert jeden Daemon via sc.exe. Install-Services.ps1 als Administrator ausführen.

Konfigurationsdateien

Jeder Daemon liest /etc/zerodds/<daemon>.yaml. Schema:

listen: 0.0.0.0:8080
domain: 0
tls:
  cert: /etc/zerodds/tls/server.crt
  key: /etc/zerodds/tls/server.key
  client_ca: /etc/zerodds/tls/clients.crt   # optional, mTLS
auth:
  mode: bearer                                # none | bearer | jwt | mtls | sasl-plain
  tokens: /etc/zerodds/tokens.txt
acl:
  default: deny
  rules:
    - { subject: "user:alice", op: read,  topic: "Sensors/*" }
    - { subject: "*group:editors*", op: write, topic: "Commands/#" }
metrics:
  enabled: true
  address: 127.0.0.1:9090

Monitoring

Eingebaute Observability:

Prometheus — /metrics auf :9090 pro Daemon. frames_in/out_total, bytes_in/out_total, connections_active, dds_samples_in/out_total, errors_total.
OpenTelemetry — OTLP/HTTP/JSON-Exporter via zerodds-observability-otlp. Standard-Histogramme: dds.write.latency, dds.read.latency, dds.heartbeat.rtt, dds.discovery.match.duration.
Catalog — /catalog JSON: Service-Name, Version, Topics + QoS-Profile.
Health — /healthz liefert 200 OK solange bereit, 503 beim Shutdown.

Backup & Recovery

Mit dem zerodds-recorder-bridge-Daemon Topics in .zddsrec-Dateien aufzeichnen. zerodds-replay liest Aufzeichnungen und publiziert sie mit skalierter Wallclock erneut — für Disaster-Recovery, Demo-Replay oder Regressionstests. Der überlebende State liegt in den Aufzeichnungen; die Daemons selbst sind zustandslos außer den TLS-Schlüsseln.

Security-Härtung

TLS auf ausschließlich TLS 1.3 pinnen (rustls-0.23-Default).
mTLS für Daemon-zu-Daemon-Links nutzen; reines Bearer ist für Browser-Clients akzeptabel.
ACL standardmäßig auf deny; explizite Regeln whitelisten.
TLS-Zertifikate via SIGHUP rotieren (RotatingTlsConfig lädt neu, ohne Verbindungen zu droppen).
Daemons unter unprivilegierten Usern mit systemd ProtectSystem=strict, NoNewPrivileges=true laufen lassen.
DDS-Security-Plugins via Participant-QoS für End-to-End-RTPS-AAD aktivieren.

Kapazitätsplanung

Zwei Fragen vor dem Sizing: wie viel Daten pro Sekunde pro Prozess, und was ist der Flaschenhals beim Skalieren darüber hinaus. Die Zahlen unten sind Referenzpunkte aus der in-tree-Benchmark-Suite — Single-Process, keine GC-Pause, auf dem Bench-Host (AMD Ryzen Threadripper PRO 3955WX, 24 Kerne, Linux 6.1 vanilla). Als Obergrenzen-Richtwert nutzen: Serialisierung, QoS-Profil und Broker-Konfiguration deiner Anwendung dominieren weit vor der Bridge.

Pfad	Referenzrate	Flaschenhals beim Skalieren
DDS über RTPS-UDP (LAN)	~4 GiB/s Payload	NIC, Kernel-UDP-Buffer
DDS über Shared Memory	< 5 µs Roundtrip	Cache-Line-Traffic, Polling vs. eventfd-Wakeup
WebSocket-Bridge	~250k Frames / s	TLS + Allokation pro Frame
MQTT-Bridge	~100k Nachrichten / s	Upstream-Broker (mosquitto / HiveMQ)
gRPC-Bridge	~50k Unary-Calls / s	HTTP/2-Verbindung & HPACK-Tabellen-Churn
AMQP-Bridge	~100k Nachrichten / s	Broker (RabbitMQ) Flow-Control-Credits

Sizing-Leitfaden

Zuerst horizontal skalieren — ein einzelner Bridge-Daemon sättigt grob einen Kern an TLS plus Framing. Ein Daemon pro Protokoll pro Node, nicht ein Riesen-Daemon für alles.
QoS dominiert — Reliable + KeepAll + kleine History-Tiefe erzeugt Back-Pressure lange bevor die Bandbreite ausgeht. KeepLast(N) mit N passend zum Worst-Case-Lag des Readers wählen.
Speicher-Budget pro Participant — etwa 2 MB plus History-Cache (Sample-Größe × N × Instanzen). 100–500 Participants pro Host sind bequem planbar.
Discovery-Kosten — SPDP-Traffic ist O(Participants²) auf der Multicast-Gruppe. Über einige hundert Participants nach Domain-ID partitionieren oder eine Unicast-Discovery-Peer-Liste nutzen.

Upgrade-Pfad

Innerhalb von 1.x: Drop-in-Replacement, keine QoS-Änderungen nötig, der Wire bleibt RTPS 2.5. Über Major-Versionen hinweg: siehe die Migrations-Notiz im jeweiligen Release.

Vollständiges Handbuch auf GitHub →