Reputation: 123670
I have set up a Cloudwatch Metric to watch a log file:
resource "aws_cloudwatch_log_metric_filter" "log_errors" {
name = "${local.fullname}-log-errors"
log_group_name = "/aws/lambda/${local.fullname}"
pattern = "{ $._logLevel = \"error\" }"
metric_transformation {
name = "${local.fullname}-error-count"
namespace = "MyApp"
value = "1"
}
}
I can see the metric is working - note the dot at 13:15 below (me manually creating a log entry to test):
And an alarm to fire if the metric reports 1 or more events within a minute:
resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
alarm_name = "${local.fullname}-log-errors"
alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
metric_name = "${local.fullname}-error-count"
threshold = "0"
statistic = "Sum"
unit = "Count"
comparison_operator = "GreaterThanThreshold"
datapoints_to_alarm = "1"
evaluation_periods = "1"
period = "60"
namespace = "MyApp"
treat_missing_data = "notBreaching"
alarm_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
ok_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
}
But despite the metric having an event (per above) the alarm is never fired:
I'm unsure how to debug this, as all the AWS resources are created successfully, errors that I create manually are passed to the metric, and I'm using a very similar alarm config in other lambdas successfully, where it throws alarms.
Why is my metric working but my alarm not alarming?
Upvotes: 3
Views: 1270
Reputation: 738
I have something very similar set up that is working & would try this. Update: Looking at it more closely, I believe you should be using comparison_operator = "GreaterThanOrEqualToThreshold"
not comparison_operator = "GreaterThanThreshold"
metric_transformation {
name = "${local.fullname}-error-count"
namespace = "MyApp"
value = "1"
default_value = "0"
}
and
resource "aws_cloudwatch_metric_alarm" "log_errors_alarm" {
alarm_name = "${local.fullname}-log-errors"
alarm_description = "log.error() count for MyApp lambda ${local.fullname}"
metric_name = "${local.fullname}-error-count"
threshold = "1"
statistic = "Sum"
#unit = "Count"
comparison_operator = "GreaterThanOrEqualToThreshold"
#datapoints_to_alarm = "1"
evaluation_periods = "1"
period = "60"
namespace = "MyApp"
treat_missing_data = "notBreaching"
alarm_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
ok_actions = [data.aws_ssm_parameter.sns_topic_arn.value]
}
Both unit
& datapoint_to_alarm
are optional parameters. Try excluding those. I assume both resources cloudwatch_log_metric_filter
& aws_cloudwatch_metric_alarm
are using the same local variables. Since you did not post all of your cloudwatch_log_metric_filter
parameters, I imagine your pattern = ""
is what it should be.
Upvotes: 0
Reputation: 12129
I'd put my money on the Unit being inconsistent between the metric_alarm
and metric_filter
.
You're setting the unit
on the metric_alarm
to be Count
, but you're not setting a unit
on the metric_filter
's metric_transformation
, so the metric_transformation
will default to None
.
Try setting the unit
in the alarm to None
or removing unit
altogether.
Upvotes: 4