aarexer
aarexer

Reputation: 553

Flink job restore from state after code changes

I am using Apache Flink 1.9 and standart checkpoint/savepoint mechanism to FS.

And my question is about: what is the proper way to restore job from savepoint, if job's code was changed? For example, after refactoring i rename few classes and after that i can't restore from old checkpoint.

I lose my data, and want to ask - what i can do in this cases?

All operators have uid and name

Upvotes: 1

Views: 1503

Answers (2)

mingtao wang
mingtao wang

Reputation: 115

seems your state cannot be treated as POJOs (POJOs: classes that follow a certain bean-like pattern). When a user-defined data type can’t be recognized as a POJO type, it must be processed as GenericType and serialized with Kryo. Currently, In Flink, schema evolution is supported only for POJO and Avro types. Therefore, if you care about schema evolution for the state, it is currently recommended to always use either Pojo or Avro for state data types.

Some docs FYI: https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/schema_evolution.html

Upvotes: 1

Dominik Wosiński
Dominik Wosiński

Reputation: 3864

Shortly speaking: it depends.

As for the more elaborate explanation, it shouldn't generally be an issue if You have only reordered and renamed the classes, obviously as long as the UIDs have not changed. As for the refactoring, it may actually influence how the state is stored and thus may prevent from restoring it. In such case You can use the parameter --allowNonRestoredState, which should allow to restore the available states from savepoint and start clean ones. Keep in mind that this may not restore all the states. In general You shouldn't really refactor the operators once they are running, since it can effectively prevent restoring from savepoint.

It's worth noting that It may not be possible to restore from savepoint if you are using SQL, refer to FLINK-6966 issue.

I assume that You are dealing with Savepoints not externalized checkpoints, otherwise there are few things to have in mind especially when changing parallelism.

Upvotes: 2

Related Questions