AndrePKI
AndrePKI

Reputation: 41

KQL / Kusto: Can I consolidate the "remaining" columns into one catchall column?

I have input with a known set of columns (say A-C) and a variable number of additional columns. I would like to ingest this into a table with columns A-C plus a column theRest. So from the input A goes into A, B into B etc, and if D exists in a record it would go to theRest as "D":"valueofD". For another record containing X and Y it would become A, B, C, theRest=["X":"valueofX", "Y":"valueofY"]

I'm trying something along the lines of

T | project A, B, C, theRest = bag_remove_keys(pack_all(), dynamic(["A","B","C"]))

which basically does what I want, except that (ofcourse) my case is a bit more complex and instead of handcoding dynamic(["A","B","C","D","E"], I would like to create something for dynamic(<expression>) where <expression> is the array of column names in the tabular expression T. The expression T has a varying number of columns, so the columns to remove by bag_remove_keys() vary also.

Edit after first answer from @David:

My example was maybe too simple. Here is my code which almost does what I want, except the formatting of theRest.

let T= datatable (data: dynamic)
[
dynamic([{"A":1,"B":2,"C":3}]),
dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}]),
dynamic([{"A":3,"G":7,"B":2,"C":1}]),
];
let X = datatable(A:int, B:int, C:int, data:dynamic) [];
let T_base_cols = toscalar(X | getschema | summarize make_list(ColumnName));
T | mv-expand Results = data
   | project  A=toint(Results['A']),
              B=toint(Results['B']),
              C=toint(Results['C']),
              theRest = bag_remove_keys(pack_all(), T_base_cols)

This gives output like this

A   B   C   theRest
1   2   3   {"Results":{"A":1,"B":2,"C":3}}
1   5   3   {"Results":{"X":4,"A":1,"B":5,"C":3,"Y":2}}
3   2   1   {"Results":{"A":3,"G":7,"B":2,"C":1}}

and I am expecting

A   B   C   theRest
1   2   3   
1   5   3   {"X":4,"Y":2}
3   2   1   {"G":7}

or similar format without curly braces in the last column.

Edit2:

After the excellent second answer from @David I came up with this

let T= datatable (data: dynamic)
[
dynamic([{"A":1,"B":2,"C":3}]),
dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}]),
dynamic([{"A":3,"G":7,"B":2,"C":1}]),
];
let T_base_cols = dynamic(["data","$.Results.A",'$.Results.B','$.Results.C']);
//print T_base_cols
T | mv-expand Results = data
   | project  A=toint(Results['A']),
              B=toint(Results['B']),
              C=toint(Results['C']),
              theRest=trim('{"Results":|}$',tostring( bag_remove_keys(pack_all(),T_base_cols)))

which yields

1   2   3   {}
1   5   3   {"X":4,"Y":2}
3   2   1   {"G":7}

I will do some testing to find out if there is any performance implications of either solution (I need to ingest and transfrom about 160TB of data into Azure Data Explorer :-))

I am very happy with this forum!

Upvotes: 0

Views: 487

Answers (2)

David דודו Markovitz
David דודו Markovitz

Reputation: 44981

If the target table columns (A, B & C) are known in advance, it becomes quite straightforward.
The last line is for replacing an empty bag ({}) with a dynamic null.

let source = datatable (data: dynamic)
[
    dynamic([{"A":1,"B":2,"C":3}])
   ,dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}])
   ,dynamic([{"A":3,"G":7,"B":2,"C":1}])
];
let target = datatable(A:int, B:int, C:int, data:dynamic)[];
let target_fixed_keys = dynamic(["A", "B", "C"]);
source
| mv-expand data
| project A = data.A
         ,B = data.B
         ,C = data.C
         ,theRest = bag_remove_keys(data, target_fixed_keys)
| extend theRest = iff(array_length(bag_keys(theRest)) == 0, dynamic(null), theRest)
A B C theRest
1 2 3
1 5 3 {"X":4,"Y":2}
3 2 1 {"G":7}

Fiddle

Upvotes: 1

David דודו Markovitz
David דודו Markovitz

Reputation: 44981

Here is a completely dynamic solution.

I have changed the tables names to source and target so it will be easier to understand what is going on in the context of update policy.

let source = datatable (data: dynamic)
[
    dynamic([{"A":1,"B":2,"C":3}])
   ,dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}])
   ,dynamic([{"A":3,"G":7,"B":2,"C":1}])
];
let target = datatable(A:int, B:int, C:int, data:dynamic)[];
let target_fixed_cols = toscalar
(
    target 
    | getschema 
    | summarize make_list_if(ColumnName, ColumnName != "data")
);
source
| mv-expand data
| extend target_rest_keys = set_difference(bag_keys(data), target_fixed_cols)
| extend target_fixed_bag = bag_remove_keys(data, target_rest_keys)
| evaluate bag_unpack(target_fixed_bag)
| extend theRest = bag_remove_keys(data, target_fixed_cols)
| extend theRest = iff(array_length(bag_keys(theRest)) == 0, dynamic(null), theRest)
| project-away data, target_rest_keys
A B C theRest
1 2 3
1 5 3 {"X":4,"Y":2}
3 2 1 {"G":7}

Fiddle

Upvotes: 1

Related Questions