Reputation: 6095
I am a newbie to Apache Flink, and I was going through the Apache Flink's examples. I found that in case of a failure Flink has the ability to restore stream processing from a checkpoint.
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(10000L);
Now, my question is where does Flink keep the checkpoint(s) by default?
Any help is appreciated!
Upvotes: 2
Views: 2481
Reputation: 555
Default state back-end is MemoryStateBackend. Means it stores the in flight data in Task manager's JVM and checkpoint it in heap of master(job manger). its good for local debugging but you will loose your checkpoints if job goes down.
Usually for production use FsStateBackend with path to external file systems (HDFS,S3 etc). It stores in flight data in Task manager's JVM and checkpoint it to external file system.
like
env.setStateBackend(new FsStateBackend("file:///apps/flink/checkpoint"));
Optionally a small meta file can also be configured pointing to the state store for High availability.
Upvotes: 2
Reputation: 18987
Flink features the abstraction of StateBackends. A StateBackend
is responsible to locally manage the state on the worker node but also to checkpoint (and restore) the state to a remote location.
The default StateBackend
is the MemoryStateBackend
. It maintains the state on the workers' (TaskManagers') JVM heap and checkpoints it to the JVM heap of the master (JobManager). Hence the MemoryStateBackend
does not require any additional configuration or external system and is good for local development. However, it is obviously not scalable and suited for any serious workload.
Flink also provides a FSStateBackend
, which holds holds local state also on the workers' JVM heap and checkpoints it to a remote file system (HDFS, NFS, ...). Finally, there's also the RocksDBStateBackend
, which stores state in an embedded disk-based key-value store (RocksDB) and also checkpoints to a remote file system (HDFS, NFS, ...).
Upvotes: 6