Confused about Kafka exactly-once semantics

Question

So i've been reading about kafka's exactly once semantics, and I'm a bit confused about how it works.

I understand how the producer avoids sending duplicate messages (in case the ack from the broker fails), but what I don't understand is how exactly-once works in the scenario where the consumer processes the message but then crashes before committing the offset. Won't kafka retry in that scenario?

radai · Accepted Answer

here's what i think you mean:

consumer X sees record Y, and "acts" on it, yet does not commit its offset
consumer X crashes (still without committing its offsets)
consumer X boots back up, is re-assigned the same partition (not guaranteed) and eventually sees record Y again

this is totally possible. however, for kafka exactly once to "work" all of your side effects (state, output) must also go into the same kafka cluster. so here's whats going to happen:

consumer X starts a transaction
consumer X sees record Y, emits some output record Z (as part of the transaction started in 1)
consumer X crashes. shortly after the broker acting as the transaction coordinator "rolls back" (im simplifying) the transaction started in 1, meaning no other kafka consumer will ever see record Z
consumer X boots back up, is assigned the same partition(s) as before, starts a new transaction
consumer X sees record Y again, emits record Z2 (as part of the transaction started in 4)
some time later consumer X commits its offsets (as part of the transaction from 4) and then commits that transaction
record Z2 becomes visible to downstream consumers.

if you have side-effects outside of the same kafka cluster (say instead of record Z you insert a row into mysql) there's no general way to make kafka exactly-once work for you. you'd need to rely on oldschool dedup and idempotance.

Confused about Kafka exactly-once semantics

Answers (2)

Related Questions