Reputation: 2129
I wanted to have global window which lasts for 10 hours after seeing the first element, but what is happening data is being emitted after a few minutes (or seconds). Why?
Code:
grouped_tis = tracking_informations | beam.WindowInto(window.GlobalWindows(),
trigger=AfterProcessingTime(10 * 3600),
accumulation_mode=AccumulationMode.DISCARDING) | beam.GroupByKey() | beam.ParDo(MergeTI())
In dataflow After 30 minutes I already get a lot of dropped elements: droppedDueToClosedWindow 39,147 GroupByKey
Upvotes: 1
Views: 226
Reputation: 106
This looks like a bug in SDK. I've created a jira ticket for Apache Beam Python SDK devs to look into the issue.
It seems that AfterProcessingTime fires early and it causes the window to close. All events that come after are properly discarded due to window being closed.
Upvotes: 1