Kafka Streams partition assignment problem with multiple topics and scaling

Question

I'm working on a kafka streams application that consumes from a consumer group with three topics. One topic has 20 partitions, another 10, and the last one 5. So in total, this consumer group has 35 partitions.

The streams application runs in a kubernetes environment and is scaled with multiple instances of the application as pods in a single deployment. The goal would be the ability to scale to 35 pods (and thus, 35 consumers) and have each partition assigned to a single consumer to allow for maximum parallelism.

However, the behavior I see is co-partitioning when partitions are being assigned as the application scales up. So one consumer will have partition 0 from all three topics, another partition 1 from all three topics, etc. This leaves the maximum parallelism I can accomplish at 20. If I had 35 consumers, only 20 would be active.

It is my understanding that I can not break from the co-partitioning behavior with kafka streams as the partition assignment strategy is unchangeable. It is a behavior that I do not want or need. I have a few solutions I've considered, but I'm not sure what approach is best and I'm looking for a little direction on how to proceed.

Accept that the maximum parallelism for this application will be whatever the highest partition count topic is in the consumer group. This would lead to some consumers processing a lot of data and some not processing very much if the lag on the topics is high.
Have each topic be consumed by a separate stream in its own consumer group. This is a problem since consumer groups operate independently and there's no way to ensure 35 consumers are assigned 35 partitions 1:1 across multiple consumer groups by default. There will very likely still be idle consumers.
A similar solution to the above where each topic will have its own consumer group/stream, but consumers will be assigned to consumer groups dynamically as pods go up and down to ensure balance. This is possible to force by using the Kafka Admin API and kubernetes API, but will be complicated/time consuming to implement and maintain.
Have all topics in the consumer group have the same number of partitions. For example, all three topics will have 20 partitions. The ensures with 20 consumers, all 20 consumers are assigned partitions. The downside is I am using confluent cloud so this comes at an increased cost, but is by far the most simple solution.

I'm leaning towards #1 or #4 as a solution, but curious if my understanding is poor or there is an easier/better solution out there.

Thank you!

Kafka Streams partition assignment problem with multiple topics and scaling

Answers (1)

Related Questions