CaiDi
CaiDi

Reputation: 11

how long should checkpoint be set?

i have a job ,use flink ingest data and send file format by parquet to HDFS,because of i`m use streamFileSink in Flink ,only checkpoint is success ,the file will be finished.I want to konw how long should be seted for checkpoit,What kind of parameters can refer to?

Upvotes: 1

Views: 149

Answers (1)

David Anderson
David Anderson

Reputation: 43454

The checkpoint interval will determine

  1. How much data may have to be reprocessed if there is a failure.
  2. How frequently the streaming file sink will write the parquet output files (which, along with the parallelism, will affect how large or small they are).

Choose whatever makes sense, given your tolerance for longer recovery times, increased latency for other processes waiting for these files to be finalized, and for larger output files.

Checkpointing also imposes some overhead on the cluster. More frequent checkpointing will impact performance.

Upvotes: 1

Related Questions