Zack ISSOIR
Zack ISSOIR

Reputation: 974

Log analytics query optimization

I run this query on log analytics

                                        Perf
                                        | where TimeGenerated >= ago(5m) 
                                        | join kind = inner
                                            ( 
                                                Heartbeat  
                                                | where TimeGenerated >= ago(5m)
                                                | summarize arg_max(TimeGenerated, *) 

                                                by SourceComputerId
                                            ) on Computer
                                        | summarize arg_max(TimeGenerated, *) by SourceComputerId, CounterName
                                        | extend  Indicator = strcat(ObjectName,'-', CounterName)
                                        | summarize dict = make_dictionary
                                        (   
                                            pack
                                            (      
                                                  'WorkspaceId' 
                                                ,  TenantId
                                                ,  Indicator       
                                                ,  CounterValue
                                                ,  'TimeGenerated'   
                                                ,  TimeGenerated
                                                ,  'Computer'
                                                ,  Computer
                                            )
                                        )  by SourceComputerId
                                        | evaluate bag_unpack(dict)

But it's a little bit slow. Is there any way to optimize it, I want the fastest possible query to achieve the same results.

Upvotes: 0

Views: 571

Answers (1)

Yoni L.
Yoni L.

Reputation: 25995

It's somewhat challenging to say without you mentioning the size of the data (e.g. record count) for each of the join legs and the cardinality of the SourceComputerId column.

I would recommend that you go over the query best practices document which covers several techniques for optimization, and see if that helps

Update: Explicitly mentioning best practices which may be helpful in your case: (for you to verify)

  • When using join operator - choose the table with less rows to be the first one (left-most).
  • When left side is small (up to 100,000 records) and right side is big then it is recommended to use the hint.strategy=broadcast.
  • When both sides of the join are too big and the join key is with high cardinality, then it is recommended to use the hint.strategy=shuffle.
  • When the group by keys of the summarize operator are with high cardinality (best practice: above 1 million) then it is recommended to use the hint.strategy=shuffle.

Upvotes: 1

Related Questions