FastWorker

Why we built a brokerless task queue

The backstory of FastWorker — why we decided Python deserved a task queue with zero external services, and what we learned shipping it.

Dipankar Sarkar · ·
fastworkeropinionarchitecture

Every few years, I find myself standing in a YAML file — a deployment manifest, a docker-compose, a Kubernetes chart — looking at a Redis service, a Celery service, a Celery beat, maybe a Flower, and asking myself whether all of this is really necessary for what is, when you squint, just “run a Python function somewhere else.” Usually the answer is yes, because that’s what people do, and the alternative is to roll your own in a week.

FastWorker started as what if we didn’t.

The shape of the problem

Most Python web services need a task queue for the same handful of reasons: offload something slow from a request handler, send an email, resize an image, call a flaky external API, run a nightly rollup. The volume is in the low thousands of tasks per minute, the tasks are short, and the team is small. In that regime, the standard playbook — Celery + Redis — is wildly overbuilt. You’re paying for features (durable queues, exactly-once delivery, complex workflows, geo-distribution) you will never use, in the form of services you must run, monitor, secure, and be paged about.

The question we kept asking was: if the broker’s only job is to hold tasks for a few seconds between producer and consumer, and both producer and consumer are Python processes we already run, why is there a broker at all?

The design constraints

We wrote down what we’d need to ship:

  1. No external services. Not Redis, not RabbitMQ, not Kafka. Just Python processes.
  2. A dashboard in the box. If you can’t see what’s happening, you don’t have a task queue — you have an outage waiting to happen.
  3. FastAPI-native. The producer lives inside async request handlers. The client API has to be async.
  4. Priority out of the box. Every task queue needs it; nobody wants to configure it.
  5. Auto-discovery. Scale out by running another process, not by editing a config file.
  6. Small enough for one person to understand. The whole point of simplicity is that the next engineer can read the code.

That list ruled out most prior art. It pointed toward a design where coordination logic lives in a first-class Python process that clients and workers talk to directly.

Why NNG

The messaging layer mattered more than we expected. We needed request/reply (for task submission), dealer/router (for dispatch), and pub/sub (for worker discovery and heartbeats). We did not want to reimplement any of that.

NNG — the successor to nanomsg — gives you all three patterns in a clean, well-tested C library. pynng wraps it for Python with a proper async API. We built FastWorker on NNG instead of raw TCP sockets, ZeroMQ, or asyncio streams for three reasons:

  1. The socket patterns are exactly what we needed. No adaptation layer.
  2. Async-first. pynng’s API works naturally with FastAPI’s event loop.
  3. One dependency, one install, one wheel. It doesn’t pull in a broker or a server.

The messaging layer is the one place we deliberately chose a dependency instead of writing it ourselves.

The control plane as a first-class process

The biggest design move was putting coordination in a first-class Python process — the “control plane” — rather than a library inside each worker. That’s what lets FastWorker skip the broker entirely. The control plane:

  • Holds the task queue (in memory, priority-sorted)
  • Tracks which subworkers are connected and how loaded each is
  • Dispatches tasks to the least-loaded subworker
  • Caches results in an LRU
  • Serves the built-in web dashboard
  • Will execute tasks itself if there are no subworkers

That last bullet matters. A minimal deployment is one process — the control plane — that plays both coordinator and worker. You only add subworkers when you need to scale out. Most deployments will never need them.

The dashboard is the product

You don’t really have a task queue until someone who isn’t you can look at a dashboard and tell you what’s happening. That’s a hard-won lesson from every production Celery + Flower setup we’ve operated. So the dashboard ships with the control plane, starts automatically, and has real-time queue depth, worker status, and task history. It’s not optional and it’s not a separate install. You get it when you pip install fastworker.

What we gave up

We were honest with ourselves about the tradeoffs:

  • No durable persistence. If the control plane crashes, queued tasks are lost. For the target use case (short tasks, modest volume), that’s acceptable — re-drive from your application state.
  • No workflows. No chains, no groups, no DAGs. If you need those, you need Celery, Temporal, or Prefect.
  • No scheduled tasks. Cron or Kubernetes CronJobs do the job fine.
  • No cross-language workers. Python only.
  • No extreme scale. We target 1K–10K tasks/minute, not 100K+.

We wrote all of this into the limitations document on day one. If your needs are outside that envelope, FastWorker isn’t for you, and we’d rather you know now than find out in production.

What it feels like to use

You install the package. You write a file with a couple of @task-decorated functions. You start the control plane in one terminal. You open http://127.0.0.1:8080 and see the dashboard. You add await client.delay(...) to a FastAPI handler. You reload the page. The task runs.

There is no broker. There is no YAML. There is no on-call rotation for a service you didn’t want to run. That’s the whole pitch, and we think it’s enough.

Where to go from here

FastWorker is alpha. We’re actively developing it, scoping carefully, and shipping under MIT. If the tradeoffs above sound like your tradeoffs, try it for an afternoon and tell us where it breaks. If you’re evaluating alternatives, start with the FastWorker vs Celery comparison — we wrote it to be honest, not promotional.

And if you want help taking FastWorker, FastAPI, or async Python into production, we do consulting.

Frequently asked questions

Does the world need another task queue?

Maybe not. But most Python teams already burn days on broker infrastructure for what should be a solved problem. FastWorker is the argument that the broker itself is optional.

Why NNG instead of raw sockets?

NNG gives us REQ/REP, DEALER/ROUTER, and PUB/SUB patterns for free, plus a clean async story in Python through pynng. Rolling those ourselves would have been months of work and a decade of bugs.