Kusto query (KQL) iterate over scalar values in subquery

Question

Below is the hypothetical schema.

Table : With billions of entries
Column: info 
Column: id_1
Column: data_1
Column: Uid
Column: time_interval

What I am trying to query?

I need to get a list of Uid from Table which match some criteria like

the time_interval is just 5 minutes
the info contains "info"
the id_ is equal to "Guid1"
the data_1 has this pattern "Something"

Also, Multiple requests can span for each uid in Table like

  Uid    time_interval   ...other_colums....
=====================
  uid1   00:00:12  
  uid1   00:00:13
  uid1   00:00:15

Once I get the list of Uid and store it as scalar Say it is [uid1, uid2, uid3]. Then, I need to query Table again and compare each of the values in the list of scalars to find the difference between the maximum and minimum time for each uid Say for uid1 example above : the time difference would have: (00:00:15 - 00:00:12) milliseconds.

So my result for [uid1, uid2, uid3] should be:

Uid      time_diff
====     =========
uid1      2 ms
uid2      3 ms
uid3      4 ms

I have the following query below for this, but the subquery which uses scalar just takes last value from the scalar defined "data". Please suggest how can I make the subquery iterate over the results in "data"

What I tried so far?

Also I cannot combine these into one because the data is such that after putting where clauses I just get one record and there is no time interval as such to calculate the difference. Thats why I need to first get all the ids and then query for time interval.

let data = toscalar(Table
| where time_interval between (ago(5m)..now())
| where info has "info"
| where id_1 == "Guid1"
| where data_1 matches regex "Something"
| project pack("uid", Uid));

Table
| where time_interval between (ago(5m)..now())
| where Uid in (data["uid"])
//| summarize dt1 = max(time_interval) - min(time_interval) by Uid
| project Uid, Rid, time_interval

yifats · Accepted Answer

Please check if the below is what you're looking for. Note that the in() operator is limited to 1 million values, so if you have more distinct ids after the filter, you will need to use inner join between the 2 data sets (shuffle join might be helpful if cardinality is very high).

let T = datatable(time_interval:datetime, id_1:string, info:string, data_1:string) 
[
    datetime(2021-05-21 10:53:04), "Guid1", "no match", "this has something in it",
    datetime(2021-05-21 10:53:06), "Guid1", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:09), "Guid1", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:04), "Guid2", "this has info in it", "this has something in it",
    datetime(2021-05-21 10:53:08), "Guid2", "this has info in it", "this has something in it"
];
let ids = T
| where time_interval between (datetime(2021-05-21 10:50)..5m)
| where info has "info"
| where id_1 == "Guid1"
| where data_1 matches regex "something"
| distinct id_1;
T 
| where id_1 in (ids)
| summarize min(time_interval), max(time_interval) by id_1

Kusto query (KQL) iterate over scalar values in subquery

Answers (2)

Related Questions