Reputation: 1855
In the Apache Beam documentation we talked about windowing with bounded PCollections.
https://beam.apache.org/documentation/programming-guide/#windowing
if we have a bounded data than all the elements would belong to the same GLOBAL window. In which use cases would we need to apply windowing on batch processing?
Upvotes: 1
Views: 1005
Reputation: 779
Bounded data does not have a notion of time and watermark. So for batch pipelines, windows are nothing but one part of a multipart key which can be used at the time of Grouping.
A possible use case would be to get list of unique users per hour from the corpus of 1 day data. Hoever, this can also be done by applying your own key (hour) and grouping.
Beam has the unified batch and streaming model and similar apis for both batch and streaming hence the windowing concept is available in both batch and streaming.
Upvotes: 1