J.J. Beam
J.J. Beam

Reputation: 3059

How does Kafka guarantee consumers doesn't read a single message twice?

How does Kafka guarantee consumers doesn't read a single message twice?

Or is the above scenario possible? Could the same message be read twice by single or by multiple consumers?

Upvotes: 4

Views: 14265

Answers (3)

Nitin
Nitin

Reputation: 3852

There are many scenarios which cause Consumer to consume the duplicate message

  1. Producer published the message successfully but failed to acknowledge which cause to retry the same message
  2. Producer publishing a batch of the message but failed partially published messages. In that case, it will retry and resent the same batch again which will cause duplicate
  3. Consumers receive a batch of messages from Kafka and manually commit their offset (enable.auto.commit=false). If consumers failed before committing to Kafka, next time Consumers will consume the same records again which reproduce duplicate on the consumer side.

To guarantee not to consume duplicate messages the job's execution and the committing offset must be atomic to guarantee exactly-once delivery semantic at the consumer side. You can use the below parameter to achieve exactly one semantic. But please you have understood this comes with a compromise with performance.

  1. enable idempotence on the producer side which will guarantee not to publish the same message twice enable.idempotence=true
  2. Defined Transaction (isolation.level) is read_committed isolation.level=read_committed

In Kafka Stream above setting can be achieved by setting Exactly-Once semantic true to make it as unit transaction

Idempotent

Idempotent delivery enables producers to write messages to Kafka exactly once to a particular partition of a topic during the lifetime of a single producer without data loss and order per partition.

Transaction (isolation.level)

Transactions give us the ability to atomically update data in multiple topic partitions. All the records included in a transaction will be successfully saved, or none of them will be. It allows you to commit your consumer offsets in the same transaction along with the data you have processed, thereby allowing end-to-end exactly-once semantics.

The producer doesn't wait to write a message to Kafka whereas the Producer uses beginTransaction, commitTransaction, and abortTransaction(in case of failure) Consumer uses isolation. level either read_committed or read_uncommitted

  • read_committed: Consumers will always read committed data only.
  • read_uncommitted: Read all messages in offset order without waiting for transactions to be committed

Please refer more in detail refrence

Upvotes: 6

H.Ç.T
H.Ç.T

Reputation: 3579

It is absolutely possible if you don't make your consume process idempotent.

For example; you are implementing at-least-one delivery semantic and firstly process messages and then commit offsets. It is possible to couldn't commit offsets because of server failure or rebalance. (maybe your consumer is revoked at that time) So when you poll you will get same messages twice.

Upvotes: 3

Saptarshi Basu
Saptarshi Basu

Reputation: 9313

To be precise, this is what Kafka guarantees:

  1. Kafka provides order guarantee of messages in a partition
  2. Produced messages are considered "committed" when they were written to the partition on all its in-sync replicas
  3. Messages that are committed will not be losts as long as at least one replica remains alive
  4. Consumers can only read messages that are committed

Regarding consuming messages, the consumers keep track of their progress in a partition by saving the last offset read in an internal compacted Kafka topic.

Kafka consumers can automatically commit the offset if enable.auto.commit is enabled. However, that will give "at most once" semantics. Hence, usually the flag is disabled and the developer commits the offset explicitly once the processing is complete.

Upvotes: 1

Related Questions