Reputation: 1795
So i've been reading about kafka's exactly once semantics, and I'm a bit confused about how it works.
I understand how the producer avoids sending duplicate messages (in case the ack from the broker fails), but what I don't understand is how exactly-once works in the scenario where the consumer processes the message but then crashes before committing the offset. Won't kafka retry in that scenario?
Upvotes: 4
Views: 1979
Reputation: 1418
Radal explained it well in its answer, regarding exactly once in a isolated Kafka cluster.
When dealing with an external database ( transactional at least) , one easy way to achieve exactly once is to UPDATE one row ( in a sgbd transaction), with your business value AND the Partition / offsets where it comes from. That way , if your consumer crash before committing to Kafka, you'll be able to get back the last Kafka offset it has processed ( by using consumer.seek())
It can though be a quite data overhead in your sgbd ( keeping offset/partition for all your rows), but you might be able to optimize a bit.
Yannick
Upvotes: 1
Reputation: 24202
here's what i think you mean:
this is totally possible. however, for kafka exactly once to "work" all of your side effects (state, output) must also go into the same kafka cluster. so here's whats going to happen:
if you have side-effects outside of the same kafka cluster (say instead of record Z you insert a row into mysql) there's no general way to make kafka exactly-once work for you. you'd need to rely on oldschool dedup and idempotance.
Upvotes: 7