Reputation: 9028
The Kafka document says, idempotent producer is possible with the same producer session and I am unable to understand this.
Say, Kafka adds sequence number for each message and the last sequence number is maintained in Kafka (not sure where does it maintain).
How does it generate the sequence number and where does it keep?
Why is it not able to maintain the sequence when the producer crash and comes up again?
How can I make it real idempotent between the producer session?
Upvotes: 9
Views: 6272
Reputation: 19218
This is a feature that is sorely missing from Kafka, and I don't see an elegant and efficient way to solve it without modifying Kafka itself.
As a preliminary, if you want true idempotency across any failure (producer or broker), then you absolutely positively need some kind of id in the business layer (rather than the lower level transport layer).
What you could do with such an id in Kafka is this: Your producer writes to a topic at-least once, and then you have a Kafka Streams process deduplicating messages from that topic using your business layer id and publishing the remaining unique messages to another topic. In order to be efficient, you should use a monotonically increasing id, aka sequence number, otherwise you would have to keep around (and persist) every id you have ever seen, which amounts to a memory leak, unless you restrict the ability to deduplicate to the last x days / hours / minutes and retain only the latest ids.
Or, you give Apache Pulsar a try, which, besides addressing other sore spots of Kafka (having to do a costly manual and error prone rebalance in order to scale out a topic, to name just one) has this feature built in.
Upvotes: 3
Reputation: 657
The configuration "idempotent" only works when the producer does not crash.
However with the transactions, you cand send data accross different partitions exactly once. You set a transaction id with your producer id (automatically created). If a new producer id arrives with the same transaction id, it means that you have a problem. Then, the records will be written exactly once.
Upvotes: 3
Reputation: 26885
The Idempotent Producer only has guarantees within the life of the Producer process. If it crashes, the new Idempotent Producer will have a different ProducerId and will start its own sequence.
The Sequence number simply starts from 0 and monotically increases for each record. If a record fails being delivered, it is sent again with its existing sequence number so it can be deduplicated (if needed) by the brokers. The sequence number is per producer and per partition.
Currently Kafka does not offer a way to "continue" an Idempotent Producer session. Each time you start one it gets a new and unique ProducerId (generated by the cluster)
Upvotes: 11