Raghav Kukreja
Raghav Kukreja

Reputation: 175

CloudWatch doesn't reliably monitor single datapoint within 24 hours

I have a Lambda function which is scheduled to run once every 24 hours. I also have a CloudWatch alarm if the number of invocations drop below 1 every 24 hours.

The issue here is that the invocation metric doesn't always show up in time for when the alarm condition is being evaluated. As a result, I have 0 invocations for a brief duration for the sliding window of 24 hours (the alarm evaluation period). This results in the alarm changing its state, only to recover within 1 minute, since the metric is now available to be evaluated.

Now all of this could have been easy to tackle if CloudWatch supported evaluation periods greater than 24 hours, but alas, it doesn't. How do I tackle this situation?

Am I approaching this problem correctly? If so, then how do I work around this CloudWatch limitation without introducing unnecessary complexity?

Upvotes: 4

Views: 952

Answers (1)

JD D
JD D

Reputation: 8137

Monitoring/alarming on a single data point is always going to be hard/tricky and is definitely a limitation of the service.

I would say you should rethink your alarm. Why do need to alarm on if your lambda executes? This is really monitoring if CloudWatch rules is working which you should trust.
I suggest you alarm on if your lambda throws errors or monitor the results of the actions that your Lambda takes if possible.

If you really must alarm on executions, maybe the best you could do is alarm when you have no executions for 2 datapoints instead of just one. This will be a lot more stable but you may not be notified of the issue for 24 hours.

Upvotes: 2

Related Questions