Reputation: 5210
Is it possible to do exactly-once processing using a Kafka source, with KafkaIO
in Beam? There is a flag that can be set called AUTO_COMMIT
, but it seems that it commits back to Kafka straight after the consumer processed the data, rather than after the pipeline completed processing the message.
Upvotes: 3
Views: 1192
Reputation: 814
Yes. Beam runners like Dataflow and Flink store the processed offsets in internal state, so it is not related to 'AUTO_COMMIT' in Kafka Consumer config. The internal state stored is check-pointed atomically with processing (actual details depends on the runner).
There some more options to achieve end-to-end exactly-once semantics (from source to Beam application to sink). KafkaIO source provides an option to read only committed records, and also supports an exactly-once sink.
Some pipelines do set 'AUTO_COMMIT', mainly so that when a pipeline is restarted from scratch (as opposed to updated, which preserves internal state), it resumes roughly around there the old pipeline left of. As you mentioned, this does not have processing guarantees.
Upvotes: 3