Reputation: 41
I have input with a known set of columns (say A-C) and a variable number of additional columns. I would like to ingest this into a table with columns A-C plus a column theRest. So from the input A goes into A, B into B etc, and if D exists in a record it would go to theRest as "D":"valueofD". For another record containing X and Y it would become A, B, C, theRest=["X":"valueofX", "Y":"valueofY"]
I'm trying something along the lines of
T | project A, B, C, theRest = bag_remove_keys(pack_all(), dynamic(["A","B","C"]))
which basically does what I want, except that (ofcourse) my case is a bit more complex and instead of handcoding dynamic(["A","B","C","D","E"]
, I would like to create something for dynamic(<expression>)
where <expression>
is the array of column names in the tabular expression T
. The expression T
has a varying number of columns, so the columns to remove by bag_remove_keys()
vary also.
Edit after first answer from @David:
My example was maybe too simple. Here is my code which almost does what I want, except the formatting of theRest.
let T= datatable (data: dynamic)
[
dynamic([{"A":1,"B":2,"C":3}]),
dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}]),
dynamic([{"A":3,"G":7,"B":2,"C":1}]),
];
let X = datatable(A:int, B:int, C:int, data:dynamic) [];
let T_base_cols = toscalar(X | getschema | summarize make_list(ColumnName));
T | mv-expand Results = data
| project A=toint(Results['A']),
B=toint(Results['B']),
C=toint(Results['C']),
theRest = bag_remove_keys(pack_all(), T_base_cols)
This gives output like this
A B C theRest
1 2 3 {"Results":{"A":1,"B":2,"C":3}}
1 5 3 {"Results":{"X":4,"A":1,"B":5,"C":3,"Y":2}}
3 2 1 {"Results":{"A":3,"G":7,"B":2,"C":1}}
and I am expecting
A B C theRest
1 2 3
1 5 3 {"X":4,"Y":2}
3 2 1 {"G":7}
or similar format without curly braces in the last column.
Edit2:
After the excellent second answer from @David I came up with this
let T= datatable (data: dynamic)
[
dynamic([{"A":1,"B":2,"C":3}]),
dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}]),
dynamic([{"A":3,"G":7,"B":2,"C":1}]),
];
let T_base_cols = dynamic(["data","$.Results.A",'$.Results.B','$.Results.C']);
//print T_base_cols
T | mv-expand Results = data
| project A=toint(Results['A']),
B=toint(Results['B']),
C=toint(Results['C']),
theRest=trim('{"Results":|}$',tostring( bag_remove_keys(pack_all(),T_base_cols)))
which yields
1 2 3 {}
1 5 3 {"X":4,"Y":2}
3 2 1 {"G":7}
I will do some testing to find out if there is any performance implications of either solution (I need to ingest and transfrom about 160TB of data into Azure Data Explorer :-))
I am very happy with this forum!
Upvotes: 0
Views: 487
Reputation: 44981
If the target table columns (A, B & C) are known in advance, it becomes quite straightforward.
The last line is for replacing an empty bag ({}
) with a dynamic null.
let source = datatable (data: dynamic)
[
dynamic([{"A":1,"B":2,"C":3}])
,dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}])
,dynamic([{"A":3,"G":7,"B":2,"C":1}])
];
let target = datatable(A:int, B:int, C:int, data:dynamic)[];
let target_fixed_keys = dynamic(["A", "B", "C"]);
source
| mv-expand data
| project A = data.A
,B = data.B
,C = data.C
,theRest = bag_remove_keys(data, target_fixed_keys)
| extend theRest = iff(array_length(bag_keys(theRest)) == 0, dynamic(null), theRest)
A | B | C | theRest |
---|---|---|---|
1 | 2 | 3 | |
1 | 5 | 3 | {"X":4,"Y":2} |
3 | 2 | 1 | {"G":7} |
Upvotes: 1
Reputation: 44981
Here is a completely dynamic solution.
I have changed the tables names to source and target so it will be easier to understand what is going on in the context of update policy.
let source = datatable (data: dynamic)
[
dynamic([{"A":1,"B":2,"C":3}])
,dynamic([{"X":4,"A":1,"B":5,"C":3,"Y":2}])
,dynamic([{"A":3,"G":7,"B":2,"C":1}])
];
let target = datatable(A:int, B:int, C:int, data:dynamic)[];
let target_fixed_cols = toscalar
(
target
| getschema
| summarize make_list_if(ColumnName, ColumnName != "data")
);
source
| mv-expand data
| extend target_rest_keys = set_difference(bag_keys(data), target_fixed_cols)
| extend target_fixed_bag = bag_remove_keys(data, target_rest_keys)
| evaluate bag_unpack(target_fixed_bag)
| extend theRest = bag_remove_keys(data, target_fixed_cols)
| extend theRest = iff(array_length(bag_keys(theRest)) == 0, dynamic(null), theRest)
| project-away data, target_rest_keys
A | B | C | theRest |
---|---|---|---|
1 | 2 | 3 | |
1 | 5 | 3 | {"X":4,"Y":2} |
3 | 2 | 1 | {"G":7} |
Upvotes: 1