Kyungmin Kim
Kyungmin Kim

Reputation: 69

Does Flink RocksDB statebackend help restoring state?

I'm considering using RocksDB as a statebackend of flink job which has state size up to 1TB.

My environment

If the job fails and retry attempts exceed maximum retry count and the job completely dies (or canceling the job), I think the checkpoint and the rocksdb file will be deleted(because I'm deploying job as per-job-mode and the task manager would also terminate).
Here, I think I lose all state and have no way to restore the state but I expect using RocksDB would help something to restore the state because it is a disk based statebackend. If not, what is the advantage of using RocksDB statebackend?

Would retaining the checkpoint on cancellation and restart the job from the checkpoint(or savepoint) help in this case?
Thank you

Upvotes: 0

Views: 316

Answers (1)

Martijn Visser
Martijn Visser

Reputation: 2108

I would recommend to check out https://nightlies.apache.org/flink/flink-docs-master/docs/ops/production_ready/ for an overview of steps to consider before putting a Flink application in production. Choosing the right state backend is one of them.

What is important for state recovery is that you enable the snapshotting mechanism. That can be either checkpoints or savepoints, which you use with the configured state backend (like RocksDB). When configured properly, your state will be snapshotted to a durable storage, so you can recover from it in case of failures. RocksDB is commonly used for large state sizes, which can't fit into memory anymore.

Upvotes: 2

Related Questions