Reputation: 37095
Suppose I have a Kafka consumer/producer that works as follows:
input
topicoutput
topicinput
The computation in step (2) might fail, but if it succeeds then it will always produce the same result.
Note that "auto-commit" is disabled.
Here is some code, in case that helps:
#r "nuget: Confluent.Kafka, 2.8.0"
open Confluent.Kafka
open System.Threading
open System.Threading.Tasks
let processor
(expensiveComputation : string -> Task<int>)
(consumer : IConsumer<string, string>)
(producer : IProducer<string, int>)
(outputTopic : string)
(ct : CancellationToken)
=
task {
while true do
// Consume the next message
let! consumeResult = Task.Run (fun () -> consumer.Consume(ct))
printfn $"Consumed %A{consumeResult.TopicPartitionOffset}"
// Compute something (might fail!)
let! computed = expensiveComputation consumeResult.Message.Value
// Write the result
let message = Message()
message.Key <- consumeResult.Message.Key
message.Value <- computed
let! topicPartitionOffset = producer.ProduceAsync(outputTopic, message)
printfn $"Produced %A{topicPartitionOffset}"
// Commit
consumer.Commit(consumeResult)
printfn $"Committed %A{consumeResult.TopicPartitionOffset}"
}
If stages 2-4 fail, then the process will restart and continue from where it failed.
This gives "at least once" processing.
Is it possible to make this "exactly once" processing in Kafka, such that the consumer only commits when the result has been successfully written to the output topic?
Upvotes: 0
Views: 47
Reputation: 142873
Have not tried this myself yet, but Kafka has transactional semantics and exactly-once guarantees in some cases.
From Exactly-Once Semantics Are Possible: Here’s How Kafka Does It blog post:
Note that exactly-once semantics is guaranteed within the scope of Kafka Streams’ internal processing only; for example, if the event streaming app written in Streams makes an RPC call to update some remote stores, or if it uses a customized client to directly read or write to a Kafka topic, the resulting side effects would not be guaranteed exactly once.
Stream processing systems that only rely on external data systems to materialize state support weaker guarantees for exactly-once stream processing. Even when they use Kafka as a source for stream processing and need to recover from a failure, they can only rewind their Kafka offset to reconsume and reprocess messages, but cannot rollback the associated state in an external system, leading to incorrect results when the state update is not idempotent.
There is also ExactlyOnce
example in the .NET Confluent Kafka repo (readme, the part about exactly once procesing). In short - you disable autocommit and use IProducer
's transactions API.
See also:
Upvotes: 0