Reputation: 9771
while i understand the pre-requisite of having co-partitioning as explained here Why does co-partitioning of two Kstreams in kafka require same number of partitions for both the streams? , I do not understand the mechanism that make sure that the partitions of each topic that have the same key, get assigned to the same KAFKA Stream. I do not see how the consumer group of KAFKA would enable that.
The way i understand it is that, we have 2 independent consumer groups, which actually may have the same name, because it is the same kafka stream application, although the suscription to each topic is independent from each other.
Somehow, the consumers in each consumer group, get assigned to partition that contains the same key. I did not know that consumer assignment to partition could be related to the content of the partitions. So far i though it was random.
Can someone explain that part ?
Upvotes: 2
Views: 1451
Reputation: 62350
The way i understand it is that, we have 2 independent consumer groups, which actually may have the same name, because it is the same kafka stream application, although the suscription to each topic is independent from each other.
All members of a consumer group have the same "name" (ie, group.id
) -- it is not possible to have two consumer groups with the same name. It would be one consumer group.
although the suscription to each topic is independent from each other
For KafkaConsumer
it's possible to have different subscription for different members in the group (even if this should be a very rare scenario). For Kafka Streams however, it is required that all members of the group (ie, application instances) execute the exact some Topology
with the exact some input topics (ie, their subscription must be the same).
I did not know that consumer assignment to partition could be related to the content of the partitions. So far i though it was random.
That is correct.
From your own answer:
In other words, if the number of partitions is the same, and the partition strategy of each producer of the topic is the same, message with same key will be assigned in the same way on the partition range, which is assigned to the consumer in the same way, i.e. as consecutive subset of partitions from each topic. Hence The same stream task will always have partitions of both topics which have the same key.
That is also correct.
Note, that Kafka Streams uses a special partition assignor (not the default ones the consumer offers) to ensure co-partitioning, stickiness (ie, state-store awareness), and to assign standby-tasks.
Upvotes: 3
Reputation: 9771
After refreshing I found the two following statement that explains it all:
A consumer group has a unique id. Each consumer group is a subscriber to one or more Kafka topics.
Hence a consumer group may involve multiple topics with their partition and a strategy to assign them to the consumer of the group.
PARTITION.ASSIGNMENT.STRATEGY (In Kafka Definitive Guide)
A PartitionAssignor is a class that, given consumers and topics they subscribed to, decides which partitions will be assigned to which consumer. By default, Kafka has two assignment strategies:
In other words, if the number of partitions is the same, and the partition strategy of each producer of the topic is the same, message with same key will be assigned in the same way on the partition range, which is assigned to the consumer in the same way, i.e. as consecutive subset of partitions from each topic. Hence The same stream task will always have partitions of both topics which have the same key.
Upvotes: 1