Evgeniy Berezovsky
Evgeniy Berezovsky

Reputation: 19218

Adding to a Kafka topic exactly once

Since 0.11, Kafka Streams offers exactly-once guarantees, but their definition of "end" in end-to-end seems to be "a Kafka topic".

For real-time applications, the first "end" however is generally not a Kafka topic, but some kind of application that outputs data - perhaps going through multiple tiers and networks - to a Kafka topic.

So does Kafka offer something to add to a topic exactly-once, in the face of network failures and application crashes and restarts? Or do I have to use Kafka's at-least-once semantics and deduplicate that topic with potential duplicates into another exactly-once topic, by means of some unique identifier?

Edit Due to popular demand, here's a specific use case. I have a client C that creates messages and sends them to a server S, which uses a KafkaProducer to add those messages to Kafka topic T.

How can I guarantee, in the face of

that all messages that C creates end up in T, exactly once (and - per partition - in the correct order)?

I would of course make C resend all messages for which it did not get an ack from S -> at-least-once. But to make it exactly once, the messages that C sends would need to contain some kind of ID, so that deduplication can be performed. That, I don't know how I can do it with Kafka.

Upvotes: 1

Views: 547

Answers (2)

Akhil Bojedla
Akhil Bojedla

Reputation: 2218

You might want to have a look at kafka's Log compaction feature. It will deduplicate messages for you provided u have unique key for all the duplicate messages.

https://kafka.apache.org/documentation/#compaction

Update:

Log compaction is not very reliable however you can change some settings to work as expected.

The more efficient way is to use kafka streams. You can achieve this using KTables.

Upvotes: 0

Matthias J. Sax
Matthias J. Sax

Reputation: 62330

Kafka's exactly-once feature, in particular the "idempotent producer" can help you with server crashes and network issues.

You can enable idempotency via Producer config enable.idempotence=true that you pass in as any other config. This ensures that every message is written exactly once and in the correct ordered if the server crashes or if there are any network issues.

Kafka's exactly-once feature, does not provide support if the producer crashes. For this case, you would need to write manual code, to figure out which messages got appended to the topic successfully before the crash (by using a consumer) and resume sending where you left off. As an alternative, you can still deduplicate consumer side as you mentioned already.

Upvotes: 1

Related Questions