Windows and States in Apache Flink

Question

This first question;

I want to learn time behaviour of window. Let's assume I'll process data every 2 seconds with the Processing time, and the current time is 10:26:25.169. At this time, I deployed job.

In this case, Will each time window be rounded to 0, 2, 4 and so on seconds? Like below;

10:26:24.000 - 10:26:26.000
10:26:26.000 - 10:26:28.000

As you can see, i've deploy job at 10:26:25.169, but flink did round window by 2 seconds. Is that right?

If not, Does windows works like below?;

10:26:25.169 - 10:26:27.169
10:26:27.169 - 10:26:29.169

Which one is true? Is this behaviour can change when I use event time instead of processing time?

The second question;

I want to keep state for each key. For that i can use richFlatMap function or keyedProcessFunction. But i wonder that can I manage state using above functions after applied window? For example;

// in this case i can manage state by key
ds.keyBy(_.field).process(new MyStateFunction)

// in this case, can i manage state after window for the same key? 
ds.keyBy(keyExtractorWithTime)
  .window(new MyWindowFunction)
  .reduce(new myRedisFunction)
  .process(new MyStateFunction)

Dominik Wosiński · Accepted Answer

As for the first question, it will always be full 2 second interval rounded, so basically as You've described:

10:26:24.000 - 10:26:26.000
10:26:26.000 - 10:26:28.000

There is an offset argument that allows You to control that behaviour to some extent. But basically while the Flink actually creates the window when the data arrives, the startTime and endTime do not depend on when the data arrives, so the data is fit into the window not the other way around.

More info can be found here

As, for the second question the ProcessWindowFunction is keyed function and thus You will be able to use keyed state inside the function just as You would be able to do it in standard ProcessFunction.

Windows and States in Apache Flink

Answers (2)

Related Questions