deeplay
deeplay

Reputation: 376

How to recover Flink job (Kubernetes) from savepoint?

I'm running Flink 1.14 app (jar embded in docker image) over Kubernetes cluster. Configs like parallelism, numberOfTaskSlots and etc. are specified in ConfigMap as flink-conf.yaml. Checkpoint directory (hdfs) is hardcoded in jar (setCheckpointStorage("hdfs://...")). Savepoint location is not specified anywhere.

Everything works fine, checkpoints are created, in case of errors the application is automatically restored from these checkpoints.

Question — how to trigger the savepoint operation manually, then manually reload the application from this savepoint?

Please take into account, that I'm on Kubernetes cluster in Flink application mode (each Flink app is independent k8s deployment) and I'm aware of -s hdfs://... parameter, but not sure how to apply that parameter in my case.

Upvotes: 0

Views: 379

Answers (1)

Cosmin Ioniță
Cosmin Ioniță

Reputation: 4045

From the official docs:

Users can trigger savepoints manually by defining a new (different/random) value to the variable savepointTriggerNonce in the job specification:

job: ... savepointTriggerNonce: 123

So what I would try to run kubectl edit job on the Flink job, update the savepointTriggerNonce value to a new random value, and then restart the job. It should start from the last savepoint.

Upvotes: 0

Related Questions