Reputation: 981
Analysing some log files using AWS CloudWatch Insights, I can plot a count aggregated in time bins with:
| stats count(*) by bin(1h)
This produces a graph, as expected, aggregating all logs in each time bin.
I want to split this data by a 'group' field, with values A and B.
| stats count(*) by group, bin(1h)
This returns log counts across time bins as expected, but the visualisation tab says 'No visualisation available.' I would like it to return a time series plot with a series for group A and a series for group B.
Where am I going wrong, or is this simply not possible?
Upvotes: 50
Views: 50833
Reputation: 621
When faced with such problem I ended up writing a python script to generate the query:
errors = [
'Some error message',
'Another error message',
'And another'
]
fields = ', '.join(f"@message like '{e}' as E{i}" for i,e in enumerate(errors))
other = '+'.join(f"E{i}" for i in range(len(errors)))
stats = ', '.join(f"sum(E{i}) as {e.replace(' ', '_').replace('=','').replace(':','').replace(',','')}" for i,e in enumerate(errors))
print(rf"""
fields @timestamp, @message
| filter level = 'error'
| fields {fields}, {other} == 0 as other
#| parse @message /(?<msg>[^\d\[]*).*/ | filter other == 1 | stats count() by msg
| stats {stats}, sum(other) as Other by bin(1d)
""")
This outputs the following query:
fields @timestamp, @message
| filter level = 'error'
| fields @message like 'Some error message' as E0, @message like 'Another error message' as E1, @message like 'And another' as E2, E0+E1+E2 == 0 as other
#| parse @message /(?<msg>[^\d\[]*).*/ | filter other == 1 | stats count() by msg
| stats sum(E0) as Some_error_message, sum(E1) as Another_error_message, sum(E2) as And_another, sum(other) as Other by bin(1d)
(The commented out line is an alternative to last line, to help catch common errors that can be added to list in python code)
Upvotes: 0
Reputation: 708
Although it's still not possible to view multiple timeseries at a time in CloudWatch Logs Insights, I was able to do this using Grafana.
My query:
fields @timestamp, @log, @logStream, @message
| filter ispresent(ContainerName) and ContainerName like /my-app-prod/ and Type = "Container"
| filter ispresent(TaskId)
| stats avg(CpuUtilized) by bin(1m), TaskId
In CloudWatch Logs Insights, the above query will give me minute by minute CPU Utilization of each task. And if I click on visualize it gives me an error.
But the same query in Grafana (> 7.x) shows the following result
Upvotes: 1
Reputation: 73625
Here is another variation based on the answers of @smth et al. Not being able to build a chart automatically based on the values is very annoying, so you need to mention each value explicitly.
fields @timestamp, @message
| filter namespace like "audit-log"
| fields coalesce(`audit.client-version`, "legacy") as version
| parse version "legacy" as v0
| parse version "886ac066*" as v1
| parse version "93e021e2*" as v2
| stats count(*), count(v0), count(v1), count(v2) by bin(5m)
Upvotes: 17
Reputation: 491
Expanding on the sample of @smth, i usually do it a little different,
with this query i follow the trend of status codes aggregated over time on a standard nginx access log
fields @timestamp, @message
| parse @message '* - * [*] "* * *" * * "-" "*"' as host, identity, dateTimeString, httpVerb, url, protocol, status, bytes, useragent
| stats count (*) as all, sum ( status < 299 ) as c_s200, sum ( status > 299 and status < 399 ) as c_s300, sum ( status > 399 and status < 499 ) as c_s400, sum ( status > 499 ) as c_s500 by bin (1m)
The trick on this is that expressions like "status > 499" returns 0 if false and 1 if true, and so, adding it up on the time bucket allows to simulate something like a 'count if [condition]'
And so does the sample generated graph look like, on the visualisation tab.
Upvotes: 34
Reputation: 616
There is also an extension to the workaround described by @smth that can support more complicated statistics than count()
. Here's an example that graphs the CPU usage across different instances over time:
| fields (instance_id like "i-instance_1") as is_instance_1, (instance_id like "i-instance_2") as is_instance_2
| stats sum(cpu * is_instance_1) as cpu_1, sum(cpu * is_instance_2) as cpu_2 by bin(5m)
Upvotes: 11
Reputation: 662
Alright, I found a very janky way to solve this. It appears that no, you cannot do
| stats count(*) by group, bin(1h)
However, you can use parse to artificially create new variables, like so:
parse "[E*]" as @error
| parse "[I*]" as @info
| parse "[W*]" as @warning
| filter ispresent(@warning) or ispresent(@error) or ispresent(@info)
| stats count(@error) as error, count(@info) as info, count(@warning) as warning by bin(15m)
Here, I'm trying to look at the distribution of log types over time. I have three types of log messages with formats: "[ERROR]", "[INFO]" and "[WARNING]"
Upvotes: 45