Convert query to KQL

Question

I have an application that contains an object. It has 2 guid values, guid1 and guid2. At any given time only 1 is populated. Firstly guid1 and at a later point in time guid2. State changes are recorded in a table below. I'd like to group the records so that all guids for an object are grouped together.

So this table ...

timestamp                      guid1                                  guid2                             text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null                            | abc1  
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | null                            | abc2
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null                                 | b7ef78cde158437fb65a6878ca908751| ghi1
2022-05-06T10:02:31.5767324Z | null                                 | 206eb977459c4f91bafb9b798f5d60c4| ghi2

... becomes this set of query results

timestamp                      guid1                                  guid2                             text
2022-05-06T10:00:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | null                            | abc1  
2022-05-06T10:01:31.5767324Z | cb73c58e-e36b-4fe3-8663-33027ba2afc7 | b7ef78cde158437fb65a6878ca908751| def1
2022-05-06T10:02:31.5767324Z | null                                 | b7ef78cde158437fb65a6878ca908751| ghi1    
2022-05-06T10:00:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | null                            | abc2
2022-05-06T10:01:31.5767324Z | ec5d1b9395444a06a36130a9d62ae2c5     | 206eb977459c4f91bafb9b798f5d60c4| def2
2022-05-06T10:02:31.5767324Z | null                                 | 206eb977459c4f91bafb9b798f5d60c4| ghi2

From the desired output above, just taking the first 3 records for example, they are now grouped to show the full state change history. First record shows guid1 had a value of cb73c58e-e36b-4fe3-8663-33027ba2afc7, then a record to show guid2 activation and then finally guid1 is null and only guid2 exists. The same correct grouping can be seen in the last 3 records for guid1 ec5d1b9395444a06a36130a9d62ae2c5

I was having difficulty attempting this in sql never mind kql, I asked the question on a seperate sql thread and the suggested solution was below which I was having difficulty converting to kql.

select * 
from t
order by Row_Number() over(partition by [timestamp] order by [timestamp]),
guid1 desc, guid2;

row_number and partition by are supported but I've not seen reference to over so I'm unsure how to achieve this.

Perhaps there is a more kql friendly way of achieving this?

[COMMENTED SOLUTION]

let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
     '2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,''                                 ,'abc1'  
    ,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,''                                 ,'abc2'
    ,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
    ,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
    ,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff'     ,''                                 ,'xxx1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,''                                 ,'yyy1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
// select records where guid1 is not empty

| summarize take_any(guid2) by guid1
// of that recordset, reduce to 2 columns of interest and select records where the accompanying guid2 is not empty. That's default take_any behaviour

| serialize 
// serialize the recordset to enable the use of window functions later in query

| extend gid = row_number() 
// mark this recordset with a parent guid row identifier

| mv-expand g = pack_array(guid1, guid2) to typeof(string)
// push recordset into an array

| where isnotempty(g)
// continue in execution if the array is populated

| project g, gid 
// reduce recordset to array of related guid1/guid2 and associated parent guid row identifier

| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
// inner join on original recordset 

| project-away g, g1
// exclude g, g1 columns from recordset

| partition hint.strategy=native by gid
  (
      order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
    | extend rid = row_number()
  )
// - partition the recordset by gid (to group related parent guid records),
// - order by gid with an order preference of non empty guid1Ids/empty guid2Ids over empty guid1Ids/non empty guid2Ids
// - mark each record with a row id

| order by gid asc, rid asc 
// order recordset

| project-reorder gid, rid
// reorder gid column to appear before rid

David דודו Markovitz · Accepted Answer

The assumption is that if we have a record with a guid2 and an empty guid1, we also have a record with that guid2 and a non-empty guid1.

Part 1

Create a dictionary of guid
We take all records that have guid1.
From those records, we keep only 1 record per guid1, preferably with guid2.
We number those records arbitrarily. These numbers will later represent groups of records (gid = group id).
We now duplicate all records that have both guid1 & guid2, for one record we keep guid1 and for the other - guid2. Note that those records will have the same record number.

Part 2

Match each original table record with a guid from the dictionary
If a record has guid1 (and maybe also guid2) we connect it to guid1 from the dictionary.
If a record has no guid1, we connect it to guid2.
At this point, each record as a gid. records that their guid1 & guid2 are connected have the same gid.

Part 3

Optional. Number the records within each group, first records with guid1, followed by records with both guid, and ending with records with only guid2

let t = datatable(timestamp:datetime,guid1:string,guid2:string,text:string)
[
     '2022-05-06T10:00:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,''                                 ,'abc1'  
    ,'2022-05-06T10:00:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,''                                 ,'abc2'
    ,'2022-05-06T10:01:31.5767324Z' ,'cb73c58e-e36b-4fe3-8663-33027ba2afc7' ,'b7ef78cde158437fb65a6878ca908751' ,'def1'
    ,'2022-05-06T10:01:31.5767324Z' ,'ec5d1b9395444a06a36130a9d62ae2c5'     ,'206eb977459c4f91bafb9b798f5d60c4' ,'def2'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'b7ef78cde158437fb65a6878ca908751' ,'ghi1'
    ,'2022-05-06T10:02:31.5767324Z' ,''                                     ,'206eb977459c4f91bafb9b798f5d60c4' ,'ghi2'
    ,'2022-05-06T10:03:31.5767324Z' ,'fee3d3522a3942a69802774f8a5128ff'     ,''                                 ,'xxx1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,''                                 ,'yyy1'
    ,'2022-05-06T10:04:31.5767324Z' ,'48b04d074cd141dba6eb9a354d26be0a'     ,'0d2ac92589634b27a171be005375b1b5' ,'yyy2'
];
t
| where isnotempty(guid1)
| summarize take_any(guid2) by guid1
| serialize 
| extend gid = row_number() 
| mv-expand g = pack_array(guid1, guid2) to typeof(string)
| where isnotempty(g)
| project g, gid 
| join kind=inner (t | extend g = coalesce(guid1, guid2)) on g
| project-away g, g1
| partition hint.strategy=native by gid
  (
      order by gid asc, iff(isnotempty(guid1), 1, 2) asc, iff(isempty(guid2), 1, 2) asc
    | extend rid = row_number()
  )
| order by gid asc, rid asc 
| project-reorder gid, rid

gid	rid	timestamp	guid1	guid2	text
1	1	2022-05-06T10:00:31.5767324Z	cb73c58e-e36b-4fe3-8663-33027ba2afc7		abc1
1	2	2022-05-06T10:01:31.5767324Z	cb73c58e-e36b-4fe3-8663-33027ba2afc7	b7ef78cde158437fb65a6878ca908751	def1
1	3	2022-05-06T10:02:31.5767324Z		b7ef78cde158437fb65a6878ca908751	ghi1
2	1	2022-05-06T10:00:31.5767324Z	ec5d1b9395444a06a36130a9d62ae2c5		abc2
2	2	2022-05-06T10:01:31.5767324Z	ec5d1b9395444a06a36130a9d62ae2c5	206eb977459c4f91bafb9b798f5d60c4	def2
2	3	2022-05-06T10:02:31.5767324Z		206eb977459c4f91bafb9b798f5d60c4	ghi2
3	1	2022-05-06T10:03:31.5767324Z	fee3d3522a3942a69802774f8a5128ff		xxx1
4	1	2022-05-06T10:04:31.5767324Z	48b04d074cd141dba6eb9a354d26be0a		yyy1
4	2	2022-05-06T10:04:31.5767324Z	48b04d074cd141dba6eb9a354d26be0a	0d2ac92589634b27a171be005375b1b5	yyy2

Fiddle

Convert query to KQL

Answers (1)

Part 1

Part 2

Part 3

Related Questions