Reputation: 309
Does Kafka rebalancing algorithm work across topics?
Suppose I have 5 topics, each with 10 partitions, and 20 instances of consumer application in the same consumer group subscribing each to these 5 topics.
Will Kafka try to balance 50 partitions evenly across 20 instances?
Or will it balance only within a topic, and thus 10 first instances may (or likely to) receive all 50 partitions, while 10 other instances may stay idle?
I know that in older days Kafka did not balance across topics, but what about current versions?
Upvotes: 5
Views: 1609
Reputation: 18475
The assignment of consumer instances to partitions depends on the Consumer Configuration partition.assignment.strategy
. Its default value is class org.apache.kafka.clients.consumer.RangeAssignor
but you can also select RoundRobinAssignor
, StickyAssignor
or you can even build your own strategy by extending the abstract class AbstractPartitionAssignor
.
I think for your case the RoundRobin assignment strategy would lead to a more balanced asignment. The difference between the strategies Range and RoundRobin are depicted in the diagram below.
In your case (having 10 partitions in each topic and 20 consumer instances) the Range strategy would lead to 10 instances being idle. However, using the RoundRobin strategy would keep all instances busy as it follows the principle: The partitions will be uniformly distributed in that the largest difference between assignments should be one partition.
Please note that consumer assignment to topic partitions is different to a Rebalance. A Rebalance is initiated when
A consumer leave the Consumer Group (eg.g by failing to send a heartbeat or by explicitly requesting to leave)
A new consumer joins the ConsumerGroup
A consumer changes its topic subscriptions
a change in the subscribed topic such as increase/decrease of partitions.
During a rebalance the consumption is paused for the entire consumerGroup and the assignment is happening again based on your selected strategy.
Upvotes: 4
Reputation: 1092
You can choose RoundRobin as partition assignor instead of default Range assignment to get all instances consuming.
Range Assignor:
Range assignor works on each topic, and it will divide partitions into several ranges based on the total number of consumer. Then all consumers will be sorted by lexicographic order and each consumer will take a range of partitions.
For you case, you have 10 partitions for each topics and total 20 consumers. Then coordinator will assign 1 partition for each of first 10 consumers. In this case, you will get 10 idle consumers.
And the same thing happens for each topic, so you will get first 10 consumers has been assigned 5 partitions(1 for each topic) and other 10 will be idle.
Round-Robin Assignor:
Round-Robin assignor will list all partitions for all topics subscribed by consumer group. And each consumer will take partitions round-robin.
For you case, coordinator will list all partitions like:
t1p1, t1p2, t1p3 ... t5p9, t5p10
And all 20 consumers will take partitions in this order, so finally you will get:
Consumer1: t1p1, t3p1, t5p1
Consumer2: t1p2, t3p2, t5p2
.
.
.
Consumer 10: t2p10, t4p10
It could be more balanced than Range Assignor.
Upvotes: 2