A scalable architecture redesign for RAKwireless — replacing synchronous HTTP calls and scheduled actions with an event-driven message bus, enabling real-time data flow, fault tolerance, and horizontal scalability.
The current system relies on scheduled actions (cron polling) for inter-service communication and direct synchronous HTTP calls for external APIs. Every data exchange requires either a timer-based query or a blocking request.
The current polling-and-HTTP architecture creates compounding performance, reliability, and maintainability issues as traffic grows.
Scheduled actions query large tables every cycle — even when nothing has changed. During peak scanning, MES tables grow rapidly and these repeated queries compete for the same resources workers need.
When WISDM or the shipping forwarder is slow or unreachable, the Odoo cron blocks or fails silently. There's no retry mechanism, no dead-letter handling, and no visibility into what was lost.
MES ↔ ERP sync waits for the next cron tick. Print jobs wait for the next polling cycle. Every data flow has an artificial delay dictated by timer intervals, not by actual data readiness.
Machines push directly to MES API. During production peaks, MES has no way to say "slow down." It either processes everything or drops requests. There's no queue to absorb bursts.
Every integration is a point-to-point connection. Adding a new consumer (analytics, a new API, monitoring) means modifying existing scheduled actions and creating new HTTP endpoints.
When something fails, there's no centralized view. Failed HTTP calls may log to Odoo but there's no cross-system tracing. Understanding "what happened to order X" requires checking multiple systems manually.
Replace polling and direct HTTP calls with a federated RabbitMQ message bus. Each service publishes events and subscribes to what it needs. Each Node-RED / bridge container is isolated per service — no credential sharing across boundaries.
Each Node-RED / bridge instance is a dedicated container scoped to one external service. WISDM credentials live only in the WISDM bridge. Shipping API keys live only in the shipping bridge. No credential sharing across boundaries — each bridge is an external extension of the system it connects to.
Moving from polling to event-driven delivers improvements across every dimension of the system.
Events fire on state change, not on timer. MO updates, print jobs, shipping triggers — all propagate in milliseconds instead of waiting for the next cron cycle.
Eliminate polling queries on large MES tables. No more scheduled actions scanning for "things to sync" — the database only works when real data flows through.
If WISDM is down, messages queue in RabbitMQ and deliver when it recovers. Dead-letter queues capture failures for replay. No more silent data loss from failed HTTP calls.
During production peaks, the message queue absorbs burst traffic. MES and consumers process at their sustainable rate. Machines never get rejected.
Need analytics? Subscribe to the bus. New API integration? Add a consumer. No existing code changes — just attach a new listener to existing topics.
Each bridge container holds only the credentials for its target system. Compromising one integration exposes zero credentials for others.
With all data flowing through the message bus, we gain a single point of observability across every integration path.
Built-in dashboard for queue depths, message rates, consumer health, and federation link status between online and offline brokers.
Metrics from all containers — CPU, memory, message throughput, error rates. Custom dashboards for production KPIs and alert thresholds.
Visual flow debugging for every integration path. See messages flowing in real-time, inspect payloads, and troubleshoot without touching code.
Queue depth thresholds, consumer lag, federation link drops, dead-letter queue entries. Notifications via webhook, email, or messaging.
| Dimension | Current Architecture | Proposed Architecture |
|---|---|---|
| Data flow trigger | ⏱ Cron timer (every X min) | ⚡ On event (instant) |
| External API calls | Synchronous, blocking | Async with retry & DLQ |
| Failure handling | Silent failure, data loss | Store-and-forward, replay |
| DB load pattern | Constant polling queries | On-demand, event-triggered |
| Peak traffic handling | Direct API hit, no buffer | Queue absorbs bursts |
| Adding new consumers | Modify cron + new endpoints | Subscribe to topic |
| MES print latency | Next poll cycle (sec–min) | Sub-second delivery |
| Offline/online sync | Cron pull/push, delay + risk | Federation, auto store & forward |
| Observability | Per-system logs, manual tracing | Centralized metrics & dashboards |
| Credential security | Shared in Odoo server actions | Isolated per bridge container |
Each phase runs alongside existing scheduled actions. Old cron jobs serve as fallback until the new path is validated, then are decommissioned.
Containerize RabbitMQ (online + offline) with federation configured. Set up Node-RED bridges. Migrate the print job flow first — it's self-contained, high-impact, and proves the entire pattern end to end.
Move WISDM device activation and shipping forwarder label requests to async consumers via dedicated bridge containers. Add dead-letter queues for retry logic. Remove blocking HTTP calls from Odoo cron.
Replace the MO/WO pull-and-push scheduled actions with event-driven federation. Odoo publishes state changes, offline broker replicates them to MES, and vice versa for completed work orders.
Route PCBA tester and machine data through the offline message bus. Deploy Grafana + Prometheus stack. All data flows are now observable, buffered, and event-driven. Decommission remaining cron jobs.