Reputation: 9771
using KSQL, and performing left outer join, i can see the result of my join sometime emitted more than once.
In other words, the same join result is emitted more than once. I am not talking about, a version of the join with the null value on the right side and a version without the null value. Literally the same record that result from a join is emitted more than once.
I wonder if that is an expected behaviour.
Upvotes: 2
Views: 424
Reputation: 24202
the general answer is yes. kafka is an at-least-once system. more specifically, a few scenarios can result in duplication:
are you seeing any such crashes/timeouts in your logs?
there are a few kafka features you could try using to reduce the likelihood of this happening to you:
enable.idempotence
to true in your producer configs (see https://kafka.apache.org/documentation/#producerconfigs) - incurs some overheadtransactional.id
on the producer in case your fail over across machines - gets complicated to manage at scaleisolation.level
to read_committed
on the consumer - adds latency (needs to be done in combination with 2 above)auto.commit.interval.ms
on the consumer - just reduces the window of duplication, doesnt really solve anything. incurs overhead at really low values.Upvotes: 3