user11631417
user11631417

Reputation:

Kusto Group By Query

I have some log data structured like:

{
    timestamp: mm/dd/yyy hh:mm:ss
    customDimensions : {
      location: "locationA",
      subLocation: "subLocationB",
      dataName: "dataNameB",
      data: "value"
    } 
}

I need to return the latest record for each "dataName" by location/subLocation.

e.g.

locationA sublocationB dataName value
locationA sublocationC dataName value
locationA sublocationB dataname2 value

Basically perform a "group by" on fields: location,sublocation, and dataName where max(timestamp)

How can I achieve this with Kusto as there is no group by support and the summarize function seems to require an aggregate function?

Upvotes: 2

Views: 9043

Answers (2)

Filippo Carraro
Filippo Carraro

Reputation: 51

One possible solution is:

<name of the table>
| summarize arg_max(customDimensions.dataName) by location, subLocation

you are supposed to have extracted a column that contains the "customDimensions" data and that is dynamic.

P.S. Since the column is dynamic, before you can run arg_max() you must cast the reference data type for that field.

Upvotes: 0

Yoni L.
Yoni L.

Reputation: 25895

You could try extending the dynamic properties into calculated columns, then summarizing using arg_max() to get the latest (according to timestamp) by location, subLocation and dataName.

For example:

datatable(d:dynamic)[
    dynamic({
    "timestamp": "2020-05-12 20:32:08",
    "customDimensions" : {
      "location": "locationA",
      "subLocation": "subLocationB",
      "dataName": "dataNameB",
      "data": "value"
        } 
    }), dynamic({
    "timestamp": "2020-05-12 20:35:08",
    "customDimensions" : {
      "location": "locationA",
      "subLocation": "subLocationB",
      "dataName": "dataNameB",
      "data": "value"
        } 
    }), dynamic({
    "timestamp": "2020-05-12 20:25:08",
    "customDimensions" : {
      "location": "locationB",
      "subLocation": "subLocationB",
      "dataName": "dataNameC",
      "data": "value"
        } 
    })
]
| project timestamp = todatetime(d.timestamp),
          location = tostring(d.customDimensions.location),
          subLocation = tostring(d.customDimensions.subLocation),
          dataName = tostring(d.customDimensions.dataName), data = tostring(d.customDimensions.data)
| summarize arg_max(timestamp, *) by location, subLocation, dataName

Upvotes: 2

Related Questions