Reputation: 63
I'm trying to calculate some sliding average for a bounded dataset, which have dates attached to it as well as some value.
Based on the docs from: https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/windowing/SlidingWindows and https://cloud.google.com/dataflow/model/windowing#sliding-time-windows
First I am emitting the datestamp with outputWithTimestamp
, dividing the timestamps into:
Window.into(
SlidingWindows
.of(Duration.standardDays(3))
.every(Duration.standardDays(1)))
So for a PCollection of dataset:
[Jan 3rd, 100]
[Jan 4th, 200]
[Jan 5th, 400]
The output PCollection I am seeing is [100, 300, 700, 600, 400]
, which seems to imply the windowing function starts with a window of Jan 1st - 3rd, and ends with a window of Jan 5th - Jan 7th. Does that make sense that the first window seems to start before my PCollection?
Upvotes: 0
Views: 568
Reputation: 6023
If you were to indicate the window associated with each element in your output PCollection, you would see this:
[Jan 1-3, 100]
[Jan 2-4, 300]
[Jan 3-5, 700]
[Jan 4-6, 600]
[Jan 5-7, 400]
Event time is "platonic" in the sense that it all exists "all at once". If you have a dataset where you know the data is complete only for a particular interval, you can filter these results to remove the values that do not fall within the interval with good data.
Upvotes: 1