Reputation: 145
I have two options to query my data, which one is preferred from performance perspective?
My event records has unique session Id, device name, and more details about the device how sends the event.
I want to count the events in a session. and get the device info for each count.
Option 1: add all device columns to the GroupExpression
Events | summerize count() by sessionId, deviceName, deviceMac, ...
Option 2: since the sessionId is unique for all devices, i can take any device name from an sessionId group.
Events | summerize count(), deviceName=any(deviceName), deviceMac=any(deviceMac), ... by sessionId
There are performance differences between the two options?
I have tried this on my data set(~1M records) and i didn't find any major difference in execution time. Thank you
Upvotes: 1
Views: 1320
Reputation: 7618
The second option is preferred since the aggregation key is smaller and the size of the aggregation key impacts the memory consumption and potentially the CPU.
As a side note, your query does not count the events per device but by session id, if you want it by device, the device should be part of the "by" clause and the session id should not be part of the aggregation.
If you want to find the distinct count of sessions by device, use the following:
Events
| summarize dcount(sessionId) by deviceName
Upvotes: 2