Reputation: 495
I have an application that contains an object. It has 2 guid values, guid1 and guid2. At any given time only 1 is populated. Firstly guid1 and at a later point in time guid2. State changes are recorded in a table below. I'd like to group the records so that all guids for an object are grouped together.
So this table ...
timestamp guid1 guid2 text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null | abc1
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | null | abc2
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null | b7ef78cde158437fb65a6878ca908751| ghi1
2022-05-06T10:02:31.5767324Z | null | 206eb977459c4f91bafb9b798f5d60c4| ghi2
... becomes this set of query results
timestamp guid1 guid2 text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null | abc1
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:02:31.5767324Z | null | b7ef78cde158437fb65a6878ca908751| ghi1
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | null | abc2
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null | 206eb977459c4f91bafb9b798f5d60c4| ghi2
From the desired output above, just taking the first 3 records for example, they are now grouped to show the full state change history. First record shows guid1 had a value of cb73c58e-e36b-4fe3-8663-33027ba2afc7, then a record to show guid2 activation and then finally guid1 is null and only guid2 exists. The same correct grouping can be seen in the last 3 records for guid1 ec5d1b9395444a06a36130a9d62ae2c5
I was having difficulty attempting this in sql never mind kql, I asked the question on a seperate sql thread and the suggested solution was below which I was having difficulty converting to kql.
select *
from t
order by Row_Number() over(partition by [timestamp] order by [timestamp]),
guid1 desc, guid2;
row_number and partition by are supported but I've not seen reference to over so I'm unsure how to achieve this.
Perhaps there is a more kql friendly way of achieving this?
[COMMENTED SOLUTION]
let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
'2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'' ,'abc1'
,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'' ,'abc2'
,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff' ,'' ,'xxx1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'' ,'yyy1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
// select records where guid1 is not empty
| summarize take_any(guid2) by guid1
// of that recordset, reduce to 2 columns of interest and select records where the accompanying guid2 is not empty. That's default take_any behaviour
| serialize
// serialize the recordset to enable the use of window functions later in query
| extend gid = row_number()
// mark this recordset with a parent guid row identifier
| mv-expand g = pack_array(guid1, guid2) to typeof(string)
// push recordset into an array
| where isnotempty(g)
// continue in execution if the array is populated
| project g, gid
// reduce recordset to array of related guid1/guid2 and associated parent guid row identifier
| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
// inner join on original recordset
| project-away g, g1
// exclude g, g1 columns from recordset
| partition hint.strategy=native by gid
(
order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
| extend rid = row_number()
)
// - partition the recordset by gid (to group related parent guid records),
// - order by gid with an order preference of non empty guid1Ids/empty guid2Ids over empty guid1Ids/non empty guid2Ids
// - mark each record with a row id
| order by gid asc, rid asc
// order recordset
| project-reorder gid, rid
// reorder gid column to appear before rid
Upvotes: 1
Views: 704
Reputation: 44991
The assumption is that if we have a record with a guid2 and an empty guid1, we also have a record with that guid2 and a non-empty guid1.
Create a dictionary of guid
We take all records that have guid1.
From those records, we keep only 1 record per guid1, preferably with guid2.
We number those records arbitrarily. These numbers will later represent groups of records (gid = group id).
We now duplicate all records that have both guid1 & guid2, for one record we keep guid1 and for the other - guid2. Note that those records will have the same record number.
Match each original table record with a guid from the dictionary
If a record has guid1 (and maybe also guid2) we connect it to guid1 from the dictionary.
If a record has no guid1, we connect it to guid2.
At this point, each record as a gid. records that their guid1 & guid2 are connected have the same gid.
Optional. Number the records within each group, first records with guid1, followed by records with both guid, and ending with records with only guid2
let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
'2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'' ,'abc1'
,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'' ,'abc2'
,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5' ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
,'2022-05-06T10:02:31.5767324Z' ,'' ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff' ,'' ,'xxx1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'' ,'yyy1'
,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a' ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
| summarize take_any(guid2) by guid1
| serialize
| extend gid = row_number()
| mv-expand g = pack_array(guid1, guid2) to typeof(string)
| where isnotempty(g)
| project g, gid
| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
| project-away g, g1
| partition hint.strategy=native by gid
(
order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
| extend rid = row_number()
)
| order by gid asc, rid asc
| project-reorder gid, rid
gid | rid | timestamp | guid1 | guid2 | text |
---|---|---|---|---|---|
1 | 1 | 2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | abc1 | |
1 | 2 | 2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751 | def1 |
1 | 3 | 2022-05-06T10:02:31.5767324Z | b7ef78cde158437fb65a6878ca908751 | ghi1 | |
2 | 1 | 2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | abc2 | |
2 | 2 | 2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5 | 206eb977459c4f91bafb9b798f5d60c4 | def2 |
2 | 3 | 2022-05-06T10:02:31.5767324Z | 206eb977459c4f91bafb9b798f5d60c4 | ghi2 | |
3 | 1 | 2022-05-06T10:03:31.5767324Z | fee3d3522a3942a69802774f8a5128ff | xxx1 | |
4 | 1 | 2022-05-06T10:04:31.5767324Z | 48b04d074cd141dba6eb9a354d26be0a | yyy1 | |
4 | 2 | 2022-05-06T10:04:31.5767324Z | 48b04d074cd141dba6eb9a354d26be0a | 0d2ac92589634b27a171be005375b1b5 | yyy2 |
Upvotes: 1