Apache Flink: How are failures handled in the DataSet API?

Question

I have two questions regarding the failure handling in Flink's DataSet API:

Why isn't the checkpointing mechanism mentioned in documentation of the DataSet API?
How are failures handled in the DataSet API, e.g., for reduce or reduceGroup transformation?

Fabian Hueske · Accepted Answer

Flink handles failures differently for streaming and batch programs.

For streaming programs, the input stream is unbound such that it is in general not possible or not feasible to replay the complete input in case of a failure. Instead Flink consistently checkpoints the state of operators and user functions and restores the state in case of a failure.

For batch programs, Flink recomputes intermediate results, which were lost due to failures, by reading the necessary input data and evaluating the relevant transformations again. This is true for all transformations, including reduce and reduceGroup.

Apache Flink: How are failures handled in the DataSet API?

Answers (1)

Related Questions