Reputation: 451
I have two questions regarding the failure handling in Flink's DataSet API:
Why isn't the checkpointing mechanism mentioned in documentation of the DataSet API?
How are failures handled in the DataSet API, e.g., for reduce
or reduceGroup
transformation?
Upvotes: 0
Views: 140
Reputation: 18987
Flink handles failures differently for streaming and batch programs.
For streaming programs, the input stream is unbound such that it is in general not possible or not feasible to replay the complete input in case of a failure. Instead Flink consistently checkpoints the state of operators and user functions and restores the state in case of a failure.
For batch programs, Flink recomputes intermediate results, which were lost due to failures, by reading the necessary input data and evaluating the relevant transformations again. This is true for all transformations, including reduce
and reduceGroup
.
Upvotes: 2