Greg
Greg

Reputation: 11512

How can I scale Kafka consumers?

I'm reading the Kafka documentation and noticed the following line:

Note however that there cannot be more consumer instances in a consumer group than partitions.

Hmm. How can I auto-scale this?

For example let's say I have a messaging system with hi/lo priorities, so I create a topic for messages and partitions for hi and lo priority messages.

If this was RabbitMQ, I'd have an auto-scalable group of consumers assigned to each partition, like this:

enter image description here

If I understand the Kafka model I can't have >1 consumer per partition in a consumer group, so that picture doesn't work for Kafka, right?

Ok, so what about >1 consumer groups like this:

enter image description here

That get's around Kafka's limitation but... If I understand how this works both consumer groups would be pulling from a partition, for example msg.hi, with their own offsets so neither would know about the other--meaning messages would likely be delivered twice!

How can I achieve the capability I had in the Rabbit design w/Kafka and still maintain the "queue-ness" of the behavior (I don't want to send a message twice)? What am I missing?

Upvotes: 30

Views: 23240

Answers (4)

Siraf
Siraf

Reputation: 1292

You can also use an AI based auto scaler like this https://www.confluent.io/events/kafka-summit-americas-2021/intelligent-auto-scaling-of-kafka-consumers-with-workload-prediction/

This scaler calculates the right number of consumer PODs based on workload prediciton and target KPI metrics

enter image description here

Upvotes: 0

vins
vins

Reputation: 15370

TL;DR

Topic is made up of partitions. Partitions decide the max number of consumers you can have in a group.

Scenario 1:

enter image description here

When we have only one consumer, It can read all the messages from all the partitions.

Scenario 2:

In the above set up, when you increase the number of consumers in the group, partition reassignment happens and instead of consumer 1 reading all the messages from all the partitions, consumer 2 could share some of the load with consumer 1 as shown below. enter image description here

Scenario 3:

What happens If I have more number of consumers than the number of partitions.? Each consumer would be assigned 1 partition. Any additional consumers in the group will be sitting idle unless you increase the number of partitions for a Topic.

enter image description here

Summary:

We need to choose the partitions accordingly. That decides the max number of consumers in the group. Changing the partition for an existing topic is really NOT recommended as It could cause issues.

That is, Let's assume a producer producing names into a topic where we have 3 partitions. All the names starting with A-I go to Partition 1, J-R in partition 2 and S-Z in partition 3. Let's also assume that we have already produced 1 million messages. Now if you suddenly increase the number of partitions to 5 from 3, It will create a different A-Z range now. That is, A-F in Partition 1, G-K in partition 2, L-Q in partition 3, R-U in partition 4 and V-Z in partition 5. Do you get it? It kind of affects the order of the messages we had before! So you need to be aware of this. If this could be a problem, then we need to choose the partition accordingly upfront.


More info is here - http://www.vinsguru.com/kafka-scaling-consumers-out-for-a-consumer-group/

Upvotes: 25

David Griffin
David Griffin

Reputation: 13927

Just create a bunch of partitions for hi and lo. 12 is a good number. So is 60. Just pick a number of partitions that matches how much maximum parallelization you want.

Honestly, although I personally would makemsg.hi and msg.lo be different topics entirely, that's not a requirement -- you can do custom parititoning to divide messages between partitions.

Upvotes: 8

Marko Bonaci
Marko Bonaci

Reputation: 5708

Your assumption about messages being consumed twice is correct (since each group consumes 100% of messages from a topic).
I agree with David. Moreover, I suggest that you create more partitions than you really need, which would leave you some headroom to increase the number of threads in the group when such a need arises.

You can always increase the number of partitions later (and/or add additional brokers), but it's nice to have that already done, so that you can only increase number of threads and be done with it (those situations usually require a quick response, so you should do all the prep. that you can do in advance).

Upvotes: 9

Related Questions