Reputation: 11032
I have encountered a peculiar problem working with Kafka consumers. When I have a topic with a number of partitions, and a consumer group, the consumption eventually becomes unbalanced if consumer number is less than partition number. For example, if I have 8 partitions and 4 consumers, I see something like this:
Client Partition Lag
C1 P0 1000000
C1 P1 1000000
C2 P2 0
C2 P3 0
C3 P4 1000000
C3 P5 1000000
C4 P6 0
C4 P7 0
So some clients have zero lag and are doing nothing, and some have large lag and are working hard but are left behind. Note that I could of course have 8 clients, but given the workload I don't need 8 clients, I need only four, it's just that Kafka allocates partitions in a way that in fact only two of the four can work. I could also allocate partitions manually but that would complicate the application logic a lot, I'm quite happy with using Kafka consumer group capabilities, except for this one annoying balance problem.
So, I wonder if there are any means to automatically adjust for this - i.e. somehow reassign the clients in a way that would distribute the work equally. I know there was a proposal for something like that but it seems like nothing is happening there. So I wonder if there's any way to do it automatically within existing means. I am using kafka-python
driver now, so ideally the solution would be implementable in Python, without requiring to move all the system to Java.
Upvotes: 2
Views: 308
Reputation: 392
Unfortunately, there is no way to guarantee absolute balance and you would need to manually configure the partition assignment for each consumer in the consumer group.
If you are using the kafka-python driver, it could be something like the example below…
>>> # manually assign the partition list for the consumer
>>> from kafka import TopicPartition
>>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
>>> consumer.assign([TopicPartition('foobar', 2)])
>>> msg = next(consumer)
See link here for more: https://kafka-python.readthedocs.io/en/master/
This second link provides a good overview of the different partition assignment strategies but please be aware that the examples are in Java: https://medium.com/streamthoughts/understanding-kafka-partition-assignment-strategies-and-how-to-write-your-own-custom-assignor-ebeda1fc06f3
Hope that helps and if you need any more detail please comment!
Upvotes: 1