ethrbunny
ethrbunny

Reputation: 10469

flink - folds on keyed windows

Working with some data that I want to perform a fold on every 15 seconds. From the 'outside' it looks as if the window is holding all the data for the duration and then submitting it to the fold function all at once.

Truth?

If so, is there a way to have the fold function called every time a new piece of data is submitted and then just the result get returned at the end of the window?

Is there some other combination of transformations that could be put together to achieve this effect?

Upvotes: 0

Views: 254

Answers (1)

aljoscha
aljoscha

Reputation: 986

Your observation is correct, yes. The reason is that the current implementation of the windowing operators is somewhat limited. Conceptually, there are two elements in a window operator: the window buffer and the window function. Let's assume the input type of the window operator is IN the output type is OUT. Now, the window buffer stores elements of type IN and when it comes time to emit elements it emits elements of type IN. The window function gets as input a collection of elements IN and emits elements of type OUT (Collection[IN] -> OUT).

If the window function is a reduce function we can pre-aggregate inside the window buffer since the signature of that is (IN, IN) -> IN. The window function basically only gets one element from the window buffer that it can emit.

If we want an efficient fold things get slightly more complex because we need the window buffer to take elements of type IN but emit type OUT and the window function to look like this: OUT -> OUT.

It is possible to do it but it's just not implemented that way right now. (By the way, I opened a Jira Issue for this: https://issues.apache.org/jira/browse/FLINK-2991)

Upvotes: 2

Related Questions