NNG (nanomsg-next-generation) is a lightweight C messaging library that implements reusable socket patterns like REQ/REP, DEALER/ROUTER, PUB/SUB, and PAIR. It's the successor to nanomsg and is used here via pynng, its Python binding.

ZeroMQ would work. NNG has a cleaner threading model, a more modern API, and a better async Python story through pynng. Either is a reasonable choice for this kind of system.

Is this safe for production?

FastWorker is alpha software. The architecture is sound and tested, but we reserve the right to change APIs before 1.0. For production, verify it against your workload and read the limitations doc.

The NNG control-plane architecture, explained

A deep dive into FastWorker's internals — how NNG socket patterns, a first-class control plane, and in-memory dispatch combine into a brokerless task queue.

When we talk about FastWorker being “brokerless,” what we really mean is that coordination — the job usually outsourced to Redis or RabbitMQ — lives inside a Python process we call the control plane. This post digs into how that works at the messaging level, and why the design lands where it does.

The cast of characters

A FastWorker deployment has three kinds of process:

Client — your application (typically a FastAPI app). Embeds fastworker.Client.
Control plane — a standalone Python process. The coordinator.
Subworkers — optional Python processes that execute tasks. You spawn them to scale out.

Only the control plane is required. A minimal deployment is one control plane process, because the control plane can also execute tasks. Add subworkers when you want to scale horizontally.

Three NNG socket patterns, one architecture

FastWorker uses three NNG socket patterns from pynng. If you’ve worked with ZeroMQ these will look familiar.

1. REQ/REP — task submission

When a client calls await client.delay("send_email", user_id), the client opens a REQ socket to the control plane and sends a serialized task envelope:

client  --REQ-->  control plane
        <--REP--  task_id

The control plane responds immediately with a task id. The actual work happens asynchronously on the control plane’s side.

Why REQ/REP? Clients need a guaranteed response — a task id to track the submission. REQ/REP is the simplest pattern that gives you that.

2. DEALER/ROUTER — dispatch

When the control plane picks a subworker to run a task, it uses a DEALER/ROUTER pair. The control plane is the ROUTER (aware of multiple clients); subworkers are DEALERs.

control plane (ROUTER) --task-->  subworker A (DEALER)
                       <--result-- subworker A
control plane (ROUTER) --task-->  subworker B (DEALER)
                       <--result-- subworker B

Unlike REQ/REP, DEALER/ROUTER can send multiple messages in flight and doesn’t require strict lockstep. That’s exactly what we want for dispatch — the control plane issues tasks as they’re picked, subworkers return results as they finish, and neither side blocks.

Why DEALER/ROUTER? It’s the standard fan-out pattern for worker pools. Each subworker is addressable by the router, and messages can flow freely in both directions.

3. PUB/SUB — discovery and heartbeats

How do subworkers find the control plane in the first place? The control plane runs a PUB socket on a well-known discovery address (default tcp://127.0.0.1:5550). Subworkers subscribe, receive control plane metadata on startup, and connect back via DEALER.

control plane (PUB) --discovery-->  subworker (SUB)
                   heartbeats, registry, shutdown

The same channel carries heartbeats: if a subworker stops sending them, the control plane drops it from the registry and in-flight tasks on that worker fail (for retry).

Why PUB/SUB? Discovery and heartbeats are naturally broadcast-shaped — the control plane publishes once, many subworkers consume. PUB/SUB is the right primitive.

What lives in the control plane

The control plane is a pure Python process running a handful of coroutines on a single asyncio loop:

REQ/REP listener — accepts task submissions from clients
PUB publisher — broadcasts discovery metadata and heartbeats
ROUTER dispatcher — sends tasks to subworkers, collects results
Local executor — runs tasks itself when no subworker is free
Dispatcher loop — picks the next task (priority-sorted) and assigns it
Result cache — in-memory LRU of task results (default 10K entries, 1h TTL)
GUI server — serves the built-in web dashboard

All of this is in one Python process. There’s no database, no Redis, no separate scheduler. Everything is in memory.

The dispatcher, in detail

The core of FastWorker is a dispatcher loop that runs inside the control plane. Simplified pseudocode:

async def dispatch_loop(self):
    while not self.shutdown:
        task = await self.queue.get_next_by_priority()
        if task is None:
            await asyncio.sleep(0.01)
            continue
        worker = self.pick_least_loaded_subworker()
        if worker is None:
            # no subworkers — run locally
            await self.execute_locally(task)
        else:
            await self.send_to_subworker(worker, task)

Three things to notice:

Priority sort is a single queue. There aren’t four separate queues; it’s one priority queue ordered by level.
Least-loaded selection is a simple counter. No consistent hashing, no sticky assignment, no fair-queueing algorithm. Just “whichever subworker currently has the fewest tasks in flight.”
Local execution is a first-class fallback. If there are no subworkers — the minimal-deployment case — the control plane runs the task itself.

Why not just use Redis for coordination?

You could. Redis has pub/sub, lists, and sorted sets; you could build FastWorker’s dispatcher on top of them and get durable persistence for free. That’s more or less what Celery does.

The problem is what you take on in exchange:

Another service to run, monitor, patch, secure. Redis is genuinely good, but it is not nothing.
Network hops per task. Producer → Redis → worker. We cut out the middle hop.
Two tech stacks. Your team now needs Python and Redis expertise.
An operational surface you can’t inspect with pdb. If something is wrong, it’s either in Python or in Redis or in the boundary between them.

For teams that need Redis’s features (persistence, cross-language access, exactly-once delivery), those costs are worth paying. For teams that don’t, they’re pure overhead. FastWorker is the design that falls out of asking “what if we don’t pay them?”

Failure modes

The honest downside of keeping coordination in a single process is that the process is a single point of coordination. If the control plane crashes, in-memory state is lost:

Queued tasks that hadn’t been dispatched yet → lost
In-flight tasks on subworkers → complete, but the result cache loses them
Client connections → drop, reconnect on restart

This is a conscious tradeoff. For the target use case — short tasks, modest volume, restartable work — it’s acceptable. For workloads where queued tasks must survive a crash, you should either (a) drive retry from your application state, not the queue, or (b) use a broker-backed queue like Celery.

Subworker failures are cleaner: the control plane notices missing heartbeats, removes the worker from the registry, and can redispatch or fail in-flight tasks. No manual intervention required.

What we didn’t build

Things that would have been tempting and that we left out:

A persistent task log. Would have made FastWorker a tiny broker. Out of scope.
Exactly-once delivery. Requires distributed consensus. Don’t want it.
Multiple control planes with replication. Requires Raft or equivalent. Not this system.
Cross-language subworkers. NNG is language-agnostic, but tasks are Python functions. Not worth the abstraction.

The scope constraint is a feature. Everything we didn’t build is something we don’t have to maintain.

In one sentence

FastWorker is what happens when you put the coordination logic of a task queue into a first-class Python process, pick NNG for the wire, and refuse to add anything else.

If that sentence makes sense to you, you already understand how the whole system works.