Reputation: 986
Below is the hypothetical schema.
Table : With billions of entries
Column: info
Column: id_1
Column: data_1
Column: Uid
Column: time_interval
What I am trying to query?
I need to get a list of Uid
from Table
which match some criteria like
time_interval
is just 5 minutesinfo
contains "info"id_
is equal to "Guid1"data_1
has this pattern "Something"Also, Multiple requests can span for each uid in Table
like
Uid time_interval ...other_colums....
=====================
uid1 00:00:12
uid1 00:00:13
uid1 00:00:15
Once I get the list of Uid and store it as scalar
Say it is [uid1, uid2, uid3].
Then, I need to query Table
again and compare each of the values in the list of scalars to find the
difference between the maximum and minimum time for each uid
Say for uid1 example above : the time difference would have: (00:00:15 - 00:00:12) milliseconds.
So my result for [uid1, uid2, uid3] should be:
Uid time_diff
==== =========
uid1 2 ms
uid2 3 ms
uid3 4 ms
I have the following query below for this, but the subquery which uses scalar just takes last value from the scalar defined "data". Please suggest how can I make the subquery iterate over the results in "data"
What I tried so far?
Also I cannot combine these into one because the data is such that after putting where clauses I just get one record and there is no time interval as such to calculate the difference. Thats why I need to first get all the ids and then query for time interval.
let data = toscalar(Table
| where time_interval between (ago(5m)..now())
| where info has "info"
| where id_1 == "Guid1"
| where data_1 matches regex "Something"
| project pack("uid", Uid));
Table
| where time_interval between (ago(5m)..now())
| where Uid in (data["uid"])
//| summarize dt1 = max(time_interval) - min(time_interval) by Uid
| project Uid, Rid, time_interval
Upvotes: 0
Views: 5477
Reputation: 2744
Please check if the below is what you're looking for. Note that the in() operator is limited to 1 million values, so if you have more distinct ids after the filter, you will need to use inner join between the 2 data sets (shuffle join might be helpful if cardinality is very high).
let T = datatable(time_interval:datetime, id_1:string, info:string, data_1:string)
[
datetime(2021-05-21 10:53:04), "Guid1", "no match", "this has something in it",
datetime(2021-05-21 10:53:06), "Guid1", "this has info in it", "this has something in it",
datetime(2021-05-21 10:53:09), "Guid1", "this has info in it", "this has something in it",
datetime(2021-05-21 10:53:04), "Guid2", "this has info in it", "this has something in it",
datetime(2021-05-21 10:53:08), "Guid2", "this has info in it", "this has something in it"
];
let ids = T
| where time_interval between (datetime(2021-05-21 10:50)..5m)
| where info has "info"
| where id_1 == "Guid1"
| where data_1 matches regex "something"
| distinct id_1;
T
| where id_1 in (ids)
| summarize min(time_interval), max(time_interval) by id_1
Upvotes: 3
Reputation: 7618
Seems like simply combining it into one query would work:
let Table = datatable(time_interval:datetime, id_1:string, info:string, data_1:string)
[datetime(2021-05-21 10:53:04), "Guid1", "this has info in it", "this has something in it",
datetime(2021-05-21 10:53:06), "Guid1", "this has info in it", "this has something in it",
datetime(2021-05-21 10:53:09), "Guid1", "this has info in it", "this has something in it",
datetime(2021-05-21 10:53:04), "Guid2", "this has info in it", "this has something in it",
datetime(2021-05-21 10:53:08), "Guid2", "this has info in it", "this has something in it"];
Table
| where time_interval between (ago(10d) .. now())
| where info has "info"
| where data_1 has "Something"
| summarize time_diff = max(time_interval) - min(time_interval) by id_1
| project id_1, time_diff
For example:
Upvotes: 1