codefluent
codefluent

Reputation: 135

ADX Kusto find most recent rows for multiple id tuples

Given I have Devices with an ID and each Device has different telemetry aspects which are stored to an ADX table. Each of these aspects is identified by a TelemetryId. The DeviceId is unique, but the TelemetryId is not. This Device-Telemery tuples are stored as individual rows in ADX. So the table looks like:

|DeviceId|TelemetryId|Value|TimeStamp|
|--------|-----------|-----|----------------------------|
|1       | 1         | 10  | 2022-06-15 08:50:42.016496 |
|1       | 2         | 20  | 2022-06-15 08:50:41.072425 |
|1       | 3         | 30  | 2022-06-15 08:50:40.199198 |
|2       | 1         | 11  | 2022-06-15 08:50:39.366868 |
|2       | 2         | 22  | 2022-06-15 08:50:38.536320 |
|2       | 3         | 33  | 2022-06-15 08:50:37.602781 |
|2       | 4         | 44  | 2022-06-15 08:50:36.702367 |
|2       | 5         | 55  | 2022-06-15 08:50:35.677494 |
|1       | 1         | 12  | 2022-06-15 08:50:34.750020 |
|1       | 2         | 22  | 2022-06-15 08:50:33.752211 |

Now I need the information about the most recent data containing all Telementry data for a specific DeviceId. From the query consumer perspective, I know which DeviceId exist in the system and which TelemetryId are assigned to each DeviceId. So I can use this information to build the query.

So for example, if I would query for DeviceId == 1 I would expect this result:

|DeviceId|TelemetryId|Value|TimeStamp|
|--------|-----------|-----|----------------------------|
|1       | 1         | 10  | 2022-06-15 08:50:42.016496 |
|1       | 2         | 20  | 2022-06-15 08:50:41.072425 |
|1       | 3         | 30  | 2022-06-15 08:50:40.199198 |

But I’m not sure if this can be achieved with a single Kusto Query? (I know how to do it with multiple ADX queries (one query for each Device-Telemetry tuple) but I am wondering if it is possible to get it done with one Query).

Upvotes: 0

Views: 281

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44921

arg_max()

For performance reasons, you should consider narrowing the dataset by TimeStamp before applying the aggregation, E.g. querying only the last week/day/hour, depends on your data ingestion behaviour.

datatable (DeviceId:int, TelemetryId:int, Value:int, TimeStamp:datetime)
[
     1 ,1 ,10 ,"2022-06-15 08:50:42.016496"
    ,1 ,2 ,20 ,"2022-06-15 08:50:41.072425"
    ,1 ,3 ,30 ,"2022-06-15 08:50:40.199198"
    ,2 ,1 ,11 ,"2022-06-15 08:50:39.366868"
    ,2 ,2 ,22 ,"2022-06-15 08:50:38.536320"
    ,2 ,3 ,33 ,"2022-06-15 08:50:37.602781"
    ,2 ,4 ,44 ,"2022-06-15 08:50:36.702367"
    ,2 ,5 ,55 ,"2022-06-15 08:50:35.677494"
    ,1 ,1 ,12 ,"2022-06-15 08:50:34.750020"
    ,1 ,2 ,22 ,"2022-06-15 08:50:33.752211"
]
| summarize arg_max(TimeStamp, *) by DeviceId, TelemetryId
DeviceId TelemetryId TimeStamp Value
2 5 2022-06-15T08:50:35.677494Z 55
2 4 2022-06-15T08:50:36.702367Z 44
2 3 2022-06-15T08:50:37.602781Z 33
2 2 2022-06-15T08:50:38.53632Z 22
2 1 2022-06-15T08:50:39.366868Z 11
1 3 2022-06-15T08:50:40.199198Z 30
1 2 2022-06-15T08:50:41.072425Z 20
1 1 2022-06-15T08:50:42.016496Z 10

Fiddle

Upvotes: 1

Related Questions