StasM
StasM

Reputation: 11032

Suboptimal partitioning in Kafka consumer group

I have encountered a peculiar problem working with Kafka consumers. When I have a topic with a number of partitions, and a consumer group, the consumption eventually becomes unbalanced if consumer number is less than partition number. For example, if I have 8 partitions and 4 consumers, I see something like this:

Client Partition Lag
C1     P0        1000000
C1     P1        1000000
C2     P2        0
C2     P3        0
C3     P4        1000000
C3     P5        1000000
C4     P6        0
C4     P7        0

So some clients have zero lag and are doing nothing, and some have large lag and are working hard but are left behind. Note that I could of course have 8 clients, but given the workload I don't need 8 clients, I need only four, it's just that Kafka allocates partitions in a way that in fact only two of the four can work. I could also allocate partitions manually but that would complicate the application logic a lot, I'm quite happy with using Kafka consumer group capabilities, except for this one annoying balance problem.

So, I wonder if there are any means to automatically adjust for this - i.e. somehow reassign the clients in a way that would distribute the work equally. I know there was a proposal for something like that but it seems like nothing is happening there. So I wonder if there's any way to do it automatically within existing means. I am using kafka-python driver now, so ideally the solution would be implementable in Python, without requiring to move all the system to Java.

Upvotes: 2

Views: 308

Answers (1)

Jessica Vasey
Jessica Vasey

Reputation: 392

Unfortunately, there is no way to guarantee absolute balance and you would need to manually configure the partition assignment for each consumer in the consumer group.

If you are using the kafka-python driver, it could be something like the example below…

>>> # manually assign the partition list for the consumer
>>> from kafka import TopicPartition
>>> consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
>>> consumer.assign([TopicPartition('foobar', 2)])
>>> msg = next(consumer)

See link here for more: https://kafka-python.readthedocs.io/en/master/

This second link provides a good overview of the different partition assignment strategies but please be aware that the examples are in Java: https://medium.com/streamthoughts/understanding-kafka-partition-assignment-strategies-and-how-to-write-your-own-custom-assignor-ebeda1fc06f3

Hope that helps and if you need any more detail please comment!

Upvotes: 1

Related Questions