Reputation: 225
I want to know how to checkpoint a window. For example, windowed wordcount:
DataStream<Tuple3<String, Long, Long>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text
.flatMap(new Tokenizer())
.assignTimestampsAndWatermarks(new timestamp())
.keyBy(0)
.timeWindow(Time.seconds(2))
.process(new CountFunction())
Q1: What state should I save in CountFunction()
? Do I need to save the buffer element of the window? Should I use ListState
to store the buffered data in the window and use ValueState
to store the current sum value?
Q2: When the fault occurs, how are the elements in the window handled? What happens when the window is restored?
Thank you for the help.
Upvotes: 2
Views: 1420
Reputation: 43439
All of the state needed for Flink's windowing APIs is managed by Flink -- so you don't need to do anything. So long as checkpointing is enabled, the window buffer will be checkpointed and restored as needed.
Normally the CountFunction won't have any state that needs to be checkpointed. If the job fails while CountFunction is in the middle of iterating over the window's contents, the job will be rewound, and CountFunction will be called again with the same inputs.
If you do need to keep state in your CountFunction, then see Using per-window state in ProcessWindowFunction for information on how to go about that. It sounds like you will want to use globalState() (state that endures across all time), which you can access via the Context object passed to your process window function.
While you don't have a keyed stream, I suggest you use the keyed state mechanism described above. You can transform your non-keyed stream into a keyed stream by using keyBy with a constant key.
Upvotes: 1