Xavier John
Xavier John

Reputation: 9437

How to get the start and end time of a tumbling window?

I get a stream of data that has CreatedOn and UpdatedOn time field. I need to get a count of Created and Updated records by TumblingWindow of 1 hour.

How can I filter CreatedOn to be between the TumblingWindow start and end time?

Upvotes: 1

Views: 325

Answers (1)

Florian Eiden
Florian Eiden

Reputation: 842

Edited following comments below

I'll make the following assumptions:

  • If a row is created in your upstream system, you get a record with equal CreatedOn and UpdatedOn timestamps
  • If a row is updated in your upstream system, you get a record with UpdatedOn different from CreatedOn

Let's say your input records look like:

{"rowId" : 123, "CreatedOn": "2022-01-01 00:00:00", "UpdatedOn": null }
{"rowId" : 123, "CreatedOn": "2022-01-01 00:00:00", "UpdatedOn": "2022-01-01 01:10:00"}

With the following query you can get a count of created/updated rows per hour:

SELECT
    DATEADD(hour,-1,System.Timestamp()) AS WindowStart,
    System.Timestamp() AS WindowEnd,
    SUM(CASE WHEN CreatedOn = UpdatedOn THEN 1 ELSE 0 END) AS CreatedCount,
    SUM(CASE WHEN CreatedOn != UpdatedOn THEN 1 ELSE 0 END) AS UpdatedCount
FROM input1
GROUP BY
    Tumbling(hour,1)

Here giving you:

WindowStart WindowEnd CreatedCount UpdatedCount
2022-04-01T10:00:00 2022-04-01T11:00:00 1 1

Notice how the windowing is done here on the wall clock time (arrival time on the live service, or here query start time when I do a local run in VS Code), so my 2 operations are regrouped on a single window. When running live, if you don't have much delay between the operation time and when ASA receives the row (arrival time), they will surely be in separate time windows.

If you want to enforce time progression on the operation time, you can do so with the TIMESTAMP BY clause:

SELECT
    DATEADD(hour,-1,System.Timestamp()) AS WindowStart,
    System.Timestamp() AS WindowEnd,
    SUM(CASE WHEN CreatedOn = UpdatedOn THEN 1 ELSE 0 END) AS CreatedCount,
    SUM(CASE WHEN CreatedOn != UpdatedOn THEN 1 ELSE 0 END) AS UpdatedCount
FROM input1 TIMESTAMP BY COALESCE(UpdatedOn, CreatedOn)
GROUP BY
    Tumbling(hour,1)

Now the output becomes:

WindowStart WindowEnd CreatedCount UpdatedCount
2021-12-31T23:00:00 2022-01-01T00:00:00 1 0
2022-01-01T01:00:00 2022-01-01T02:00:00 0 1

Upvotes: 1

Related Questions