Reputation: 4120
This is kind of just pie-in-the-sky brainstorming kind of stuff, not expecting concrete answers but hoping for some pointers.
I'm imagining a workflow where we trigger a savepoint, and inspect the savepoint files to look at the state for specific operators -- as a debugging aide, perhaps, or as a simpler(?) way of achieving what we might do with queryable state...
Assuming that could work, how about the possibility of modifying / fixing the data in the savepoint to be used when restarting the same or a modified version of the job?
Or perhaps generating a savepoint more or less from scratch to define the initial state for a new job? Sort of in lieu of feeding data in to backfill state?
Do such facilities exist already? My guess is no, based on what I've been able to find so far. How would I go about accomplishing something like that? My high-level idea so far goes something like:
savepoint -->
SavepointV2Serializer.deserialize -->
write to json -->
manually inspect / edit the files, or
other tooling that works with json to inspect / modify
SavepointV2Serializer.serialize -->
new savepoint
I haven't actually written any code yet, so I really don't know how feasible that is. Thoughts?
Upvotes: 1
Views: 943
Reputation: 43524
You want to use the State Processor API, which is coming soon as part of Flink 1.9. This will make it possible to read, write, and modify savepoints using Flink’s batch DataSet api.
Upvotes: 1