Reputation: 1719
Given a query that looks like this:
SELECT
EventDate,
system.Timestamp as test
INTO
[azuretableoutput]
FROM
[csvdata] TIMESTAMP BY EventDate
According to documentation, EventDate should now be used as timestamp. However, when storing data into blobstorage with this path:
sadata/Y={datetime:yyyy}/M={datetime:MM}/D={datetime:dd}
I seem to still get ingested time. In my case, ingested time means nothing and I need to use EventDate for the path. Is this possible?
When checking data in Visual Studio, test and EventDate should be equal, however results look like this:
EventDate ;Test
2020-04-03T11:13:07.3670000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.0460000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.0460000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:07.3670000Z;2020-04-09T02:16:15.5390000Z
2020-04-03T11:13:08.1470000Z;2020-04-09T02:16:15.5390000Z
Late tollerance arrival window is set as: 99:23:59:59 Out of order tollerance is set as: 00:00:00:00 with out of order action set to adjust.
When running same query in Stream Analytics on Azure i get this result:
[{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"},
{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"},
{"eventdate":"2020-04-03T11:13:20.1060000Z","test":"2020-04-03T11:13:20.1060000Z"}]
So far so good. When running the query with data on Azure it produces this path:
Y=2020/M=04/D=09
It should have produced this path: Y=2020/M=04/D=03 Interestingly enough, when checking the data that is actually stored in blobstorage I find this:
EventDate,test
2020-04-03T11:20:39.3100000Z,2020-04-09T19:33:35.3870000Z,
System.timestamp seems to only be altered when testing the query on sampled data, but is not actually altered when the query is running normally and receiving data.
I have tested this with late arrival setting set to 0 and 20 days. In reality I need to disable late arrival adjustment as I might get events that are years old through the pipeline.
Upvotes: 4
Views: 402
Reputation: 1290
This issue has been brought up and closed on the MicrosoftDocs GitHub
The Microsoft folks say:
Maximum days for late arrival is 20, so if the policy is set to 99:23:59:59 (99 days). The adjustment could be causing a discrepancy in System.Timestamp.
By definition of late arrival tolerance window, for each incoming event, Azure Stream Analytics compares the event time with the arrival time; if the event time is outside of the tolerance window, you can configure the system to either drop the event or adjust the event’s time to be within the tolerance.
Consider that after watermarks are generated, the service can potentially receive events with event time lower than the watermark. You can configure the service to either drop those events, or adjust the event’s time to the watermark value.
As a part of the adjustment, the event’s System.Timestamp is set to the new value, but the event time field itself is not changed. This adjustment is the only situation where an event’s System.Timestamp can be different from the value in the event time field, and may cause unexpected results to be generated.
For more information, please see Understand time handling in Azure Stream Analytics.
Unfortunately, testing with sample data in Azure portal doesn't take policies into account at this time.
Potentially other helpful resources:
Upvotes: 1