Reputation: 463
I am running some test jobs on Azure Stream Analytics running this query:
SELECT System.Timestamp AS ts, Collect()
INTO output−queue
FROM input-hub TIMESTAMP BY tapp
GROUP BY HoppingWindow(second , 4 , 2)
and it turns out that, in some cases, the timestamp for the window end is a multiple of the window slide parameter, but sometimes not.
For example, with slide = 2
you get this window closing timestamps:
2016-08-04T10:36:40.0000000Z
2016-08-04T10:36:42.0000000Z
2016-08-04T10:36:44.0000000Z
2016-08-04T10:36:46.0000000Z
2016-08-04T10:36:48.0000000Z
Or, in the case slide = 5
:
2016-08-04T14:55:15.0000000Z
2016-08-04T14:55:20.0000000Z
2016-08-04T14:55:25.0000000Z
2016-08-04T14:55:30.0000000Z
This is true even for different slide values (e.g. 2, 3, 4, 6, ...). Moreover, it is always true! No matter when the job has been started.
There are values instead (such as 7, 11) that does not follow this rule.
Can somebody answer why does this happen?
I am wondering how does Azure SA decides when to open the first window.
Thank you so much!
Upvotes: 1
Views: 1171
Reputation: 491
There are different kinds of windows (see here for more details).
First, window starts/ends do not depend on the job start time.
Tumbling and Hopping windows are best to think of, logically, as partitioning the timeline itself. For example applying Tumbling window of 1 minute will make results to appear only at time values which are mod of 1 min, i.e. 2:00pm, 2:01pm, etc.
Note that not every 1 minute boundary must have window result, it rather depends on the computation.
Sliding window can produce output at any point on the timeline, and unlike tumbling and hopping windows, does depend on the timestamps of the input events. Best way to think of sliding windows is that window may end at any input event and starts slide amount of time before that event. In other words for each event window will include all events occurred at or up to slide time before it.
Hope this helps.
Upvotes: 4