Architecture Evolution — From Polling to Event-Driven

Current State

How It Works Today

The current system relies on scheduled actions (cron polling) for inter-service communication and direct synchronous HTTP calls for external APIs. Every data exchange requires either a timer-based query or a blocking request.

☁️ Online / Cloud

🛒Shopify

⚙️Odoo ERP

HK ↔ CN multicompany

📡WISDM API

device activation

📦Shipping Forwarder

label generation

⚡ Network Boundary — Factory LAN

🏭MES (Offline Odoo)

scanning, WO processing, SN/EUI capture

🔧Machine Bridge

PCBA testers, local machines

🖨️Print App → CUPS

polls print_job table

Cron polling (scheduled action)

Synchronous HTTP call

Blinking = recurring load

Pain Points

Why This Doesn't Scale

The current polling-and-HTTP architecture creates compounding performance, reliability, and maintainability issues as traffic grows.

ISSUE 01

Database Pressure From Polling

Scheduled actions query large tables every cycle — even when nothing has changed. During peak scanning, MES tables grow rapidly and these repeated queries compete for the same resources workers need.

ISSUE 02

Synchronous HTTP Fragility

When WISDM or the shipping forwarder is slow or unreachable, the Odoo cron blocks or fails silently. There's no retry mechanism, no dead-letter handling, and no visibility into what was lost.

ISSUE 03

Latency From Cron Intervals

MES ↔ ERP sync waits for the next cron tick. Print jobs wait for the next polling cycle. Every data flow has an artificial delay dictated by timer intervals, not by actual data readiness.

ISSUE 04

No Backpressure or Buffering

Machines push directly to MES API. During production peaks, MES has no way to say "slow down." It either processes everything or drops requests. There's no queue to absorb bursts.

ISSUE 05

Tight Coupling Between Services

Every integration is a point-to-point connection. Adding a new consumer (analytics, a new API, monitoring) means modifying existing scheduled actions and creating new HTTP endpoints.

ISSUE 06

No Observability

When something fails, there's no centralized view. Failed HTTP calls may log to Odoo but there's no cross-system tracing. Understanding "what happened to order X" requires checking multiple systems manually.

Proposed Architecture

Event-Driven with Message Bus

Replace polling and direct HTTP calls with a federated RabbitMQ message bus. Each service publishes events and subscribes to what it needs. Each Node-RED / bridge container is isolated per service — no credential sharing across boundaries.

☁️ Online / Cloud

🛒Shopify

existing connector

⚙️Odoo ERP

HK ↔ CN multicompany

🔴Node-RED / Bridge

Odoo ↔ MQ adapter

🐇RabbitMQ — Online Broker

mo.*wo.*shipping.*wisdm.*mes.sync.*

📊Monitoring

Grafana + Prometheus

🔴Node-RED / Bridge

WISDM consumer

📡WISDM API

🔴Node-RED / Bridge

shipping consumer

📦Shipping API

⚡ Federation / Shovel — Store & Forward

🐇RabbitMQ — Offline Broker

mo.*wo.*mes.sync.*print.*machine.*

🔴Node-RED / Bridge

MES sync

🏭MES (Offline Odoo)

scanning, WO processing

🔴Node-RED / Bridge

print consumer

🖨️CUPS Server

🔴Node-RED / Bridge

machine data

🔧Machine Bridge

PCBA, testers

🏭 Factory Floor

Event flow (publish/subscribe)

Offline event flow

Federation link (store & forward)

Animated message particle

🔐

Credential Isolation by Design

Each Node-RED / bridge instance is a dedicated container scoped to one external service. WISDM credentials live only in the WISDM bridge. Shipping API keys live only in the shipping bridge. No credential sharing across boundaries — each bridge is an external extension of the system it connects to.

Impact

What This Changes

Moving from polling to event-driven delivers improvements across every dimension of the system.

~0s

Real-Time Data Flow

Events fire on state change, not on timer. MO updates, print jobs, shipping triggers — all propagate in milliseconds instead of waiting for the next cron cycle.

−70%

Database Load Reduction

Eliminate polling queries on large MES tables. No more scheduled actions scanning for "things to sync" — the database only works when real data flows through.

∞

Fault Tolerance

If WISDM is down, messages queue in RabbitMQ and deliver when it recovers. Dead-letter queues capture failures for replay. No more silent data loss from failed HTTP calls.

↕

Natural Backpressure

During production peaks, the message queue absorbs burst traffic. MES and consumers process at their sustainable rate. Machines never get rejected.

Effortless Extensibility

Need analytics? Subscribe to the bus. New API integration? Add a consumer. No existing code changes — just attach a new listener to existing topics.

🔒

Security Isolation

Each bridge container holds only the credentials for its target system. Compromising one integration exposes zero credentials for others.

Observability

Full-Stack Monitoring

With all data flowing through the message bus, we gain a single point of observability across every integration path.

🐇

RabbitMQ Management

Built-in dashboard for queue depths, message rates, consumer health, and federation link status between online and offline brokers.

📈

Grafana + Prometheus

Metrics from all containers — CPU, memory, message throughput, error rates. Custom dashboards for production KPIs and alert thresholds.

🔴

Node-RED Debug

Visual flow debugging for every integration path. See messages flowing in real-time, inspect payloads, and troubleshoot without touching code.

🚨

Alerting

Queue depth thresholds, consumer lag, federation link drops, dead-letter queue entries. Notifications via webhook, email, or messaging.

Side by Side

Before vs. After

Dimension	Current Architecture	Proposed Architecture
Data flow trigger	⏱ Cron timer (every X min)	⚡ On event (instant)
External API calls	Synchronous, blocking	Async with retry & DLQ
Failure handling	Silent failure, data loss	Store-and-forward, replay
DB load pattern	Constant polling queries	On-demand, event-triggered
Peak traffic handling	Direct API hit, no buffer	Queue absorbs bursts
Adding new consumers	Modify cron + new endpoints	Subscribe to topic
MES print latency	Next poll cycle (sec–min)	Sub-second delivery
Offline/online sync	Cron pull/push, delay + risk	Federation, auto store & forward
Observability	Per-system logs, manual tracing	Centralized metrics & dashboards
Credential security	Shared in Odoo server actions	Isolated per bridge container

Implementation Plan

Incremental Migration

Each phase runs alongside existing scheduled actions. Old cron jobs serve as fallback until the new path is validated, then are decommissioned.

Phase 1 — Foundation

Deploy Brokers & Print Flow

Containerize RabbitMQ (online + offline) with federation configured. Set up Node-RED bridges. Migrate the print job flow first — it's self-contained, high-impact, and proves the entire pattern end to end.

Lowest risk, immediate impact on MES DB

Phase 2 — External APIs

WISDM & Shipping Consumers

Move WISDM device activation and shipping forwarder label requests to async consumers via dedicated bridge containers. Add dead-letter queues for retry logic. Remove blocking HTTP calls from Odoo cron.

Eliminates most fragile failure points

Phase 3 — Core Sync

MES ↔ ERP Event-Driven Sync

Replace the MO/WO pull-and-push scheduled actions with event-driven federation. Odoo publishes state changes, offline broker replicates them to MES, and vice versa for completed work orders.

Largest performance gain, needs careful validation

Phase 4 — Full Bus

Machine Bridge & Monitoring

Route PCBA tester and machine data through the offline message bus. Deploy Grafana + Prometheus stack. All data flows are now observable, buffered, and event-driven. Decommission remaining cron jobs.

Complete architecture migration

From Pollingto Event-Driven

How It Works Today

Why This Doesn't Scale

Database Pressure From Polling

Synchronous HTTP Fragility

Latency From Cron Intervals

No Backpressure or Buffering

Tight Coupling Between Services

No Observability

Event-Driven with Message Bus

Credential Isolation by Design

What This Changes

Real-Time Data Flow

Database Load Reduction

Fault Tolerance

Natural Backpressure

Effortless Extensibility

Security Isolation

Full-Stack Monitoring

RabbitMQ Management

Grafana + Prometheus

Node-RED Debug

Alerting

Before vs. After

Incremental Migration

Deploy Brokers & Print Flow

WISDM & Shipping Consumers

MES ↔ ERP Event-Driven Sync

Machine Bridge & Monitoring

From Polling
to Event-Driven