Reputation: 792
My understanding is that kafka streams supports partitioning. I am wondering how that works when joining data from two different topics? I assume that in order to join data based on two different topics, the client app must some how guarantee that the messages it gets from both topics share the same key. Just wondering how kafka streams does this?
Upvotes: 4
Views: 3058
Reputation: 792
A piece of the puzzle is making sure that Kafka streams gets allocated the same partition number for both topics. To guarantee this, it connects to both topics using the same consumer instance, and then relies on a range assignor strategy to get the same partition number. See https://kafka.apache.org/24/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html
Upvotes: 0
Reputation: 401
There are couple of pre-requisets to be able to do stream-stream , ktable-ktable or stream-ktable joins;
TopologyBuilderException
at runtime when partitions are about to be assigned.Other than this requirement any join will work but in order it to be work correctly a number of additional requirements must be met, such as;
GlobalKTable joins don't have any of this requirements and will work with every topic regardless of partition count, partitioning strategy vs because all data for globalKTable will be presented to every single streams instance.
When messages are produced they will be sent to partitions based on their key and partitioning strategy, streams API assigns same topics partitions from each topic to the same processor so that all relevant messages from same topic having same key will be processed in same processor. For windowed joins message timestamps are considered to find messages to join for this particular window and emit the result once the join is done.
Upvotes: 8