SBhogal
SBhogal

Reputation: 147

Kafka Streams Co-Partitioning is required while joining two KStreams

Recently i started reading about Kafka streams for upcoming project and stumbled upon the concept which says co-partitioning is required if we want to join two streams, all i was able to understand is if we have two Topics A and B both must have same number of partitions and for key 'X' say the partition number also must be same for both topics.

Topic A with partition A0, A1 ,A2 Topic B with partition B0, B1, B2

then message with key 'X' must be publish in A0 and B0 respectively.

Question: why partition number must be same for both topic (for 'X' key) and what issues we might faced if we have same number of partition in two topics but some of partition is idle i.e messages is not distributed evenly across partition.

Upvotes: 0

Views: 592

Answers (1)

nipuna
nipuna

Reputation: 4105

When you do Kafka streaming, Kafka group consumer is used. So, your topic partitions are assigned according to Kafka partitioning strategies. Default is range assigner. read here for more.

To join Two streams, Both messages with same key should be available in same consumer instance. Otherwise your streaming consumer can not find other message to join. To make sure that, Partition number should be same for both topics and key should be same.

When partition number same for both topics, Kafka Partitioning Range Assigner makes sure that same partition assigned to same instance.

This from kafka perspective. From application side, your producer should make sure to produce messages using hash partitioner. It is the default. Then if there is same number of partition for both topics, then hashing makes sure same key should go to same partition number for both topics.

Kafka streaming Co-Partitioning is doing this to make sure when your topics has not these things.

Upvotes: 1

Related Questions