Georgy Nevsky
Georgy Nevsky

Reputation: 131

Azure Data Explorer (kusto) how to summarize by day and top with “others”?

I’m newbie in Kusto language – please help me to create query.

Here dataset:

let T = datatable(d:datetime , s:string)
[
    datetime(2019-10-01T00:01:00.00), "A",
    datetime(2019-10-01T00:02:00.00), "A",
    datetime(2019-10-01T00:03:00.00), "A",
    datetime(2019-10-02T00:01:00.00), "A",
    datetime(2019-10-02T00:02:00.00), "A",
    datetime(2019-10-02T00:03:00.00), "A",    
    datetime(2019-10-01T00:01:00.00), "C",    
    datetime(2019-10-01T00:02:00.00), "C",
    datetime(2019-10-02T00:01:00.00), "C",
    datetime(2019-10-02T00:02:00.00), "C",
    datetime(2019-10-01T00:01:00.00), "D",        
    datetime(2019-10-02T00:01:00.00), "D",    
    datetime(2019-10-01T00:01:00.00), "E",    
    datetime(2019-10-02T00:01:00.00), "E",                                
];

I wish to get top 2 of each “s” string with others and count summarized by day. I.e. result needs to be:

2019-10-01T00:00:00Z    A   3
2019-10-01T00:00:00Z    C   2
2019-10-01T00:00:00Z    Other   2
2019-10-02T00:00:00Z    A   3
2019-10-02T00:00:00Z    C   2
2019-10-02T00:00:00Z    Other   2

I think I close with query:

T
| summarize c = count() by bin(d, 1d), s
| top-nested of d by dummy0 = max(0)
| top-nested 2 of s with others = "Other" by c0 = sum(c);

But it doesn’t work.

Please advise.

Upvotes: 7

Views: 26288

Answers (2)

yifats
yifats

Reputation: 2744

Here's a way to do this with top-nested, which should perform better than your suggestion:

let T = datatable(d:datetime , s:string)
[
    datetime(2019-10-01T00:01:00.00), "A",
    datetime(2019-10-01T00:02:00.00), "A",
    datetime(2019-10-01T00:03:00.00), "A",
    datetime(2019-10-02T00:01:00.00), "A",
    datetime(2019-10-02T00:02:00.00), "A",
    datetime(2019-10-02T00:03:00.00), "A", 
    datetime(2019-10-01T00:01:00.00), "C",    
    datetime(2019-10-01T00:02:00.00), "C",
    datetime(2019-10-02T00:01:00.00), "C",
    datetime(2019-10-02T00:02:00.00), "C",
    datetime(2019-10-01T00:01:00.00), "D",        
    datetime(2019-10-02T00:01:00.00), "D",    
    datetime(2019-10-01T00:01:00.00), "E",    
    datetime(2019-10-02T00:01:00.00), "E",                                
];
T
| summarize c = count() by bin(d, 1d), s
| top-nested of d by dummy=max(0), top-nested 2 of s with others = "Others" by _count = sum(c)
| where _count > 0 | project-away dummy

enter image description here

Upvotes: 12

Georgy Nevsky
Georgy Nevsky

Reputation: 131

Found own way to accomplish that. Not sure how it is optimal.

let Q=T
|top-nested 2 of s with others = "" by c = count()
| project sq = s;
T
| join kind=leftouter Q on $left.s==$right.sq
| summarize c = count() by bin(d, 1d), s=sq
| project d,s=iif(isempty(s),"Other",s),c

Upvotes: 1

Related Questions