Reputation: 911
In Kafka Stream library, I want to know difference between KTable and GlobalKTable.
Also in KStream class, there are two methods leftJoin()
and outerJoin()
. What is the difference between these two methods also?
I read KStream.leftJoin, but did not manage to find an exact difference.
Upvotes: 15
Views: 24888
Reputation: 62350
A KTable
shardes the data between all running Kafka Streams instances, while a GlobalKTable
has a full copy of all data on each instance. The disadvantage of GlobalKTable
is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.
Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For GlobalKTable
, there is no time synchronization and thus update to GlobalKTable
and completely decoupled from the processing of the stream records (thus, you get weaker semantics).
For further details, see KIP-99: Add Global Tables to Kafka Streams.
About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.
For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.
For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.
Upvotes: 40