Reputation: 19218
Since 0.11, Kafka Streams offers exactly-once guarantees, but their definition of "end" in end-to-end seems to be "a Kafka topic".
For real-time applications, the first "end" however is generally not a Kafka topic, but some kind of application that outputs data - perhaps going through multiple tiers and networks - to a Kafka topic.
So does Kafka offer something to add to a topic exactly-once, in the face of network failures and application crashes and restarts? Or do I have to use Kafka's at-least-once semantics and deduplicate that topic with potential duplicates into another exactly-once topic, by means of some unique identifier?
Edit Due to popular demand, here's a specific use case. I have a client C that creates messages and sends them to a server S, which uses a KafkaProducer
to add those messages to Kafka topic T.
How can I guarantee, in the face of
that all messages that C creates end up in T, exactly once (and - per partition - in the correct order)?
I would of course make C resend all messages for which it did not get an ack from S -> at-least-once. But to make it exactly once, the messages that C sends would need to contain some kind of ID, so that deduplication can be performed. That, I don't know how I can do it with Kafka.
Upvotes: 1
Views: 547
Reputation: 2218
You might want to have a look at kafka's Log compaction feature. It will deduplicate messages for you provided u have unique key for all the duplicate messages.
https://kafka.apache.org/documentation/#compaction
Update:
Log compaction is not very reliable however you can change some settings to work as expected.
The more efficient way is to use kafka streams. You can achieve this using KTables.
Upvotes: 0
Reputation: 62330
Kafka's exactly-once feature, in particular the "idempotent producer" can help you with server crashes and network issues.
You can enable idempotency via Producer
config enable.idempotence=true
that you pass in as any other config. This ensures that every message is written exactly once and in the correct ordered if the server crashes or if there are any network issues.
Kafka's exactly-once feature, does not provide support if the producer crashes. For this case, you would need to write manual code, to figure out which messages got appended to the topic successfully before the crash (by using a consumer) and resume sending where you left off. As an alternative, you can still deduplicate consumer side as you mentioned already.
Upvotes: 1