Reputation: 8498
We have a AWS Cloudwatch alarm that's very clearly gone over the threshold line indicated in the metric graph that's being monitored but it didn't trigger.
What is going on here? How can an alarm clearly go over the threshold for way longer than it's period and evaluation time and not trigger?
Upvotes: 1
Views: 413
Reputation: 8498
If we look at the settings for the alarm there's two very interesting things of note.
The first interesting thing is that the alarm is in the Insufficient Data
state for a continuous line graph.
The second is that the alarm is configured for seconds as the unit and the above graph shows milliseconds. And in fact if we list a set of metrics for the iterator age
aws cloudwatch get-metric-statistics --namespace "AWS/Lambda" --metric-name "IteratorAge" --dimensions Name=FunctionName,Value=prod-pipeline-rules-exec --statistics Maximum --start-time $(gdate -u -d '20 minutes ago' +%Y-%m-%dT%TZ) --end-time $(gdate -u +%Y-%m-%dT%TZ) --period 60 --region <region>
[
{
"Timestamp": "2019-12-18T01:43:00Z",
"Maximum": 2327.0,
"Unit": "Milliseconds"
},
{
"Timestamp": "2019-12-18T01:25:00Z",
"Maximum": 2188.0,
"Unit": "Milliseconds"
},
{
"Timestamp": "2019-12-18T01:34:00Z",
"Maximum": 2459.0,
"Unit": "Milliseconds"
}
]
The units are in Milliseconds.
Unfortunately, Cloudwatch will treat unit mismatches as missing data and this will lead to your alarms never triggering.
Upvotes: 1