Allon Guralnek
Allon Guralnek

Reputation: 16121

Many small queues in Kafka - how to maintain scale-out load-balancing?

I am building a message distribution system using Kafka. It will handle tens of thousand of events per second (all of uniform structure), and will have thousands of possible recipients. Messages will arrive at the system, get queued in Kafka and then get dispatched to the recipient. The requirements are:

Being new to Kafka, I'm not sure how to model it. At first I was thinking a topic per recipient, with one partition per topic. I know that Kafka 2.0 can support an unlimited number of topics, so that's not a problem.

This sounds like the mechanism of consumer groups. So I was looking into one partition per recipient. In Kafka, each partition is it's own queue that can progress at it's own pace, and partitions are handed-out and divided between consumers in a consumer group automatically, just what I need! But the problem with partitions is that they are meant as a load-balancing mechanism for one stream of data so they have a few limitations.

How should I use Kafka to solve this queuing problem? Or perhaps Kafka isn't the right tool for the job?

Upvotes: 1

Views: 432

Answers (1)

Maxim Fateev
Maxim Fateev

Reputation: 6890

I don't think Kafka is a good good fit for such use cases. It wasn't designed for huge number of queues and downstream consumers. It also relies on time based retention which doesn't play well with lengthly consumer downtimes.

I would recommend looking into Cadence Workflow to implement your application.

Cadence offers a lot of other advantages over using queues for task processing.

  • Dynamically created task queues. The number of queues is unlimited.
  • Built it exponential retries with unlimited expiration interval
  • Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
  • Support for long running heartbeating operations
  • Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
  • Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
  • Ability to cancel an update in flight.

See the presentation that goes over Cadence programming model.

Upvotes: 1

Related Questions