jamborta
jamborta

Reputation: 5210

Exactly once from Kafka source in Apache Beam

Is it possible to do exactly-once processing using a Kafka source, with KafkaIO in Beam? There is a flag that can be set called AUTO_COMMIT, but it seems that it commits back to Kafka straight after the consumer processed the data, rather than after the pipeline completed processing the message.

Upvotes: 3

Views: 1192

Answers (1)

Raghu Angadi
Raghu Angadi

Reputation: 814

Yes. Beam runners like Dataflow and Flink store the processed offsets in internal state, so it is not related to 'AUTO_COMMIT' in Kafka Consumer config. The internal state stored is check-pointed atomically with processing (actual details depends on the runner).

There some more options to achieve end-to-end exactly-once semantics (from source to Beam application to sink). KafkaIO source provides an option to read only committed records, and also supports an exactly-once sink.

Some pipelines do set 'AUTO_COMMIT', mainly so that when a pipeline is restarted from scratch (as opposed to updated, which preserves internal state), it resumes roughly around there the old pipeline left of. As you mentioned, this does not have processing guarantees.

Upvotes: 3

Related Questions