iggybbz
iggybbz

Reputation: 63

Sliding Windows Starting Point

I'm trying to calculate some sliding average for a bounded dataset, which have dates attached to it as well as some value.

Based on the docs from: https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/windowing/SlidingWindows and https://cloud.google.com/dataflow/model/windowing#sliding-time-windows

First I am emitting the datestamp with outputWithTimestamp, dividing the timestamps into:

Window.into(
    SlidingWindows
        .of(Duration.standardDays(3))
        .every(Duration.standardDays(1)))

So for a PCollection of dataset:

[Jan 3rd, 100]
[Jan 4th, 200]
[Jan 5th, 400]

The output PCollection I am seeing is [100, 300, 700, 600, 400], which seems to imply the windowing function starts with a window of Jan 1st - 3rd, and ends with a window of Jan 5th - Jan 7th. Does that make sense that the first window seems to start before my PCollection?

Upvotes: 0

Views: 568

Answers (1)

Kenn Knowles
Kenn Knowles

Reputation: 6023

If you were to indicate the window associated with each element in your output PCollection, you would see this:

[Jan 1-3, 100]
[Jan 2-4, 300]
[Jan 3-5, 700]
[Jan 4-6, 600]
[Jan 5-7, 400]

Event time is "platonic" in the sense that it all exists "all at once". If you have a dataset where you know the data is complete only for a particular interval, you can filter these results to remove the values that do not fall within the interval with good data.

Upvotes: 1

Related Questions