Reputation: 61
I have read a lot of articles recently, including the official documentation, trying to understand how the Global Window works in Apache Beam. I have read similar questions here in Stackoverflow but I couldn't come to an understanding.
Accordingly to the official docs:
You can use the single global window if you are working with an unbounded data set (e.g. from a streaming data source) but use caution when applying aggregating transforms such as GroupByKey and Combine. The single global window with a default trigger generally requires the entire data set to be available before processing, which is not possible with continuously updating data.
So the Global Window doesn't have an ending and it makes sense since it's global. The docs recommends to use a non-default trigger when doing aggregations because the default trigger is to fire panes when the window closes:
Set a non-default trigger. This allows the global window to emit results under other conditions, since the default windowing behavior (waiting for all data to arrive) will never occur.
I'm confused by this. The logic here would be that Global Window wouldn't be ble to fire events to the next step of the pipeline because it never ends thus the default trigger never occurs. However, this isn't what happens in a real scenario. If I read from an unbounded PCollection with a global window, the events would still be pushed downstream.
Could someone clarify this question to me? How the default Global Window with default trigger works in Apache Beam for unbounded pcollections? I'm assuming that it does not aggregate results at all and just handles the events as they arrive, one by one. I would like to be sure if that's the case.
Upvotes: 6
Views: 4500
Reputation: 2024
Default trigger is to fire when the watermark reaches the end of the Window
based on the event time. This never occurs for a GlobalWindow
so if you use a GlobalWindow
the default trigger will never be fired.
But if you set a non-default-trigger, for example to fire after a certain number of elements are processed (using the AfterCount
trigger), your elements can be emitted even for a GlobalWindow
. See here for more information regarding Beam triggers.
Upvotes: 2
Reputation: 141
Triggers lets us decide when the window results are computed.
When we say Default Trigger , it implies to repeatable execution of AfterWatermark trigger
whereas, AfterWatermark creates a trigger firing the pane after the end of the window.
Coming back to your question ,
How the default Global Window with default trigger works in Apache Beam for unbounded pcollections?
If you use Global window with Default trigger,So data will be never aggregated because data will be constantly updating.
It will be resulting in non-firing of trigger as global window won't end.
And Yes,Your assumption is correct that it does not aggregate results at all and just handles the events as they arrive, one by one.
Reference:
Upvotes: 1