Reputation: 3059
How does Kafka guarantee consumers doesn't read a single message twice?
Or is the above scenario possible? Could the same message be read twice by single or by multiple consumers?
Upvotes: 4
Views: 14265
Reputation: 3852
There are many scenarios which cause Consumer to consume the duplicate message
To guarantee not to consume duplicate messages the job's execution and the committing offset must be atomic to guarantee exactly-once delivery semantic at the consumer side. You can use the below parameter to achieve exactly one semantic. But please you have understood this comes with a compromise with performance.
In Kafka Stream above setting can be achieved by setting Exactly-Once semantic true to make it as unit transaction
Idempotent
Idempotent delivery enables producers to write messages to Kafka exactly once to a particular partition of a topic during the lifetime of a single producer without data loss and order per partition.
Transaction (isolation.level)
Transactions give us the ability to atomically update data in multiple topic partitions. All the records included in a transaction will be successfully saved, or none of them will be. It allows you to commit your consumer offsets in the same transaction along with the data you have processed, thereby allowing end-to-end exactly-once semantics.
The producer doesn't wait to write a message to Kafka whereas the Producer uses beginTransaction, commitTransaction, and abortTransaction(in case of failure) Consumer uses isolation. level either read_committed or read_uncommitted
Please refer more in detail refrence
Upvotes: 6
Reputation: 3579
It is absolutely possible if you don't make your consume process idempotent.
For example; you are implementing at-least-one delivery semantic and firstly process messages and then commit offsets. It is possible to couldn't commit offsets because of server failure or rebalance. (maybe your consumer is revoked at that time) So when you poll you will get same messages twice.
Upvotes: 3
Reputation: 9313
To be precise, this is what Kafka guarantees:
Regarding consuming messages, the consumers keep track of their progress in a partition by saving the last offset read in an internal compacted Kafka topic.
Kafka consumers can automatically commit the offset if enable.auto.commit
is enabled. However, that will give "at most once" semantics. Hence, usually the flag is disabled and the developer commits the offset explicitly once the processing is complete.
Upvotes: 1