Reputation: 9
I am new to kafka.I have a Kafka Stream using java microservice that consumes the messages from kafka topic produced by producer and processes. The kafka commit interval has been set using the auto.commit.interval.ms
. My question is, before commit if the microservice crashes , what will happen to the messages that got processed but didn't get committed? will there be duplicated records? and how to resolve this duplication, if happens?
Upvotes: 1
Views: 3420
Reputation: 512
There are mainly three types of delivery semantics,
At most once- Offsets are committed as soon as the message is received at consumer. It's a bit risky as if the processing goes wrong the message will be lost.
At least once- Offsets are committed after the messages processed so it's usually the preferred one. If the processing goes wrong the message will be read again as its not been committed. The problem with this is duplicate processing of message so make sure your processing is idempotent. (Yes your application should handle duplicates, Kafka won't help here) Means in case of processing again will not impact your system.
Exactly once- Can be achieved for kafka to kafka communication using kafka streams API. Its not your case.
You can choose semantics from above as per your requirement.
Upvotes: 1
Reputation: 1565
Kafka provides various delivery semantics. These delivery semantics can be decided on the basis of your use-case you've implemented.
If you're concerned that your messages should not get lost by consumer service - you should go ahead with at-lease once
delivery semantic.
Now answering your question on the basis of at-least once
delivery semantics:
If your consumer service crashes before committing the Kafka message, it will re-stream the message once your consumer service is up and running. This is because the offset for a partition was not committed. Once the message is processed by the consumer, committing an offset for a partition happens. In simple words, it says that the offset has been processed and Kafka will not send the committed message for the same partition.
at-least once
delivery semantics are usually good enough for use cases where data duplication is not a big issue or deduplication is possible on the consumer side. For example - with a unique key in each message, a message can be rejected when writing duplicate data to the database.
Upvotes: 1
Reputation: 5924
Kafka has exactly-once-semantics which guarantees the records will get processed only once. Take a look at this section of Spring Kafka's docs for more details on the Spring support for that. Also, see this section for the support for transactions.
Upvotes: 1