ecstrim
ecstrim

Reputation: 61

Grouping similar column string values

I have a table in Azure Log Analytics where messages are logged. There aren't many distinct messages actually, but in every one there is a variable part like an user id or a timestamp. I need to count the distinct message types grouped by one hour intervals, ignoring the variable elements in every message (UUID and timestamp in this case). I don't know all the message types. I cannot touch anything else, I am forced to work with this table.

Example data:

timestamp | message 
----------|--------------------------------------------------------
          | Message type A for user id 993215f6-c42a-4957-bd55-78d71306a8d0
          | Message type A for user id 60e7d02c-770a-4641-b379-6bd33fcd563c
          | Message type A for user id 5bf7646c-092b-4e20-ba43-de7fe01010ea
          | Another message type containing timestamp hh:mm:ss
          | Another message type containing timestamp hh:mm:ss
          | Another message type containing timestamp hh:mm:ss
          | Type C message <variable_string>

Desired output:

timestamp                   | distinct_message                           | count 
----------------------------|--------------------------------------------|------
10/2/2019, 10:00:00.000 AM  | Message type A for user id                 | 25
10/2/2019, 10:00:00.000 AM  | Another message type containing timestamp  | 13
10/2/2019, 10:00:00.000 AM  | Type C message                             | 0
10/2/2019, 11:00:00.000 AM  | Message type A for user id                 | 4
10/2/2019, 11:00:00.000 AM  | Another message type containing timestamp  | 6
10/2/2019, 11:00:00.000 AM  | Type C message                             | 2

This is what I've managed to create, but my knowledge of KQL is quite limited.

let regex_uid = "[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+";
traces 
| where timestamp > ago(1d) 
| extend message = replace(regex_uid, "", message) 
| extend message = replace("[0-9]+", "", message) 
| extend message = iif(message startswith "Type C message", "Type C message", message ) 
| project timestamp, message, operation_Name
| summarize count(operation_Name) by bin(timestamp, 1h), message

Is there any better way to do this?

Upvotes: 0

Views: 1736

Answers (1)

Yoni L.
Yoni L.

Reputation: 25895

another option for you to consider is using the reduce operator: https://learn.microsoft.com/en-us/azure/kusto/query/reduceoperator

the output won't be identical to the one in your question. though if I understand your intention correctly, it follows the same principles.

Upvotes: 3

Related Questions