Many small queues in Kafka - how to maintain scale-out load-balancing?
I am building a message distribution system using Kafka. It will handle tens of thousand of events per second (all of uniform structure), and will have thousands of possible recipients. Messages will arrive at the system, get queued in Kafka and then get dispatched to the recipient. The requirements are:
- Message order for a specific recipient must be preserved, no loss of messages is acceptable.
- The rate at which messages for each recipient arrive and the rate each recipient handles messages can differ wildly, and recipients may have lengthy downtimes (e.g. a week), so each recipient needs its own queue to progress (or stall) at its own rate.
- A stalled recipient shouldn't affect the flow of messages for any other recipients, and it shouldn't hurt throughput either.
- New recipients can be added at any time during runtime, and the system should start dispatching messages to the new recipient within a reasonable time (but doesn't have to be immediate).
- The application that consumes and processes the messages from Kafka and dispatches them to the various recipients should be able to scale out to multiple nodes. Each instance should handle a part of the work, whether it be divided by message processing capacity, recipient count or some other way, it doesn't have to be perfectly balanced, but it should be generally scaleable at runtime with no downtime, and recover from node failures.
Being new to Kafka, I'm not sure how to model it. At first I was thinking a topic per recipient, with one partition per topic. I know that Kafka 2.0 can support an unlimited number of topics, so that's not a problem.
- You can use patterns to subscribe to multiple topics, which would automatically refreshes periodically. So any new recipient (having it's own topic) would begin consumption by a node automatically.
- But then what mechanism would divvy up the topics between the application nodes? Processing order must be preserved so each topic should only be handled by a single node.
- When an application node comes up, how would it know which topics it should consume? When a node goes down, how will its topics be distributed to the other nodes?
This sounds like the mechanism of consumer groups. So I was looking into one partition per recipient. In Kafka, each partition is it's own queue that can progress at it's own pace, and partitions are handed-out and divided between consumers in a consumer group automatically, just what I need! But the problem with partitions is that they are meant as a load-balancing mechanism for one stream of data so they have a few limitations.
- Partitions are not entirely dynamic. Having a partition per recipient would mean adding a partition every time a new recipient was added to the system. This would trigger a rebalancing and seems to be mix functional and non-functional concerns in an inappropriate way, coupling a business entity with infrastructure configuration.
- Partitions are numbered, so how would I map a recipient name (a string) to a partition number consistently in a 1-to-1 fashion? I guess I could use a sequence generator to number my recipients, but that feels like a hack on top of a wrong solution. If I ever need to delete a recipient, that would leave a hole in the numbering. I don't want the possibility of more than one recipient mapped to the same partition because a stall in one recipient would affect the others.
- Should I pre-allocate partitions to prevent rebalancing? If I have 5000 recipients and the number is expected to grow, should I define 20,000 partition and just have 75% of them remain unused at that point in time? That would prevent rebalancing every time a recipient is added, but feels like a hack.
How should I use Kafka to solve this queuing problem? Or perhaps Kafka isn't the right tool for the job?
Answers (1)
I don't think Kafka is a good good fit for such use cases. It wasn't designed for huge number of queues and downstream consumers. It also relies on time based retention which doesn't play well with lengthly consumer downtimes.
I would recommend looking into Cadence Workflow to implement your application.
Cadence offers a lot of other advantages over using queues for task processing.
- Dynamically created task queues. The number of queues is unlimited.
- Built it exponential retries with unlimited expiration interval
- Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
- Support for long running heartbeating operations
- Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
- Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
- Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.