user1129682
user1129682

Reputation: 1091

AWS SNS, Lambda and AutoScaling timing issues

I am using Terraform for AWS deployments and currently I am trying to tie a Lambda function to the scaling behavior of an ECS cluster. This works in general, but the timing of things is unacceptable. In my latest attempt, the cluster grows at 1:11pm, 1:14pm, 1:17pm and at 1:20pm, but the Lambda function is triggered at 1:11pm, 1:36pm, 1:38pm and 1:56pm.

I am looking for a solution where the Lambda function is triggered (about) when the cluster scales, i.e. spawns addional EC2 instances).

My approach works like this:

:

resource "aws_cloudwatch_metric_alarm" "ecs_grow" {
    [...]
    comparison_operator = "GreaterThanOrEqualToThreshold"
    namespace = "AWS/ECS"
    metric_name = "CPUUtilization"
    threshold = "16"
    statistic = "Average"
    period = "60"
    evaluation_periods  = "1"
    alarm_actions = [
        "${aws_autoscaling_policy.grow_policy.arn}",
        "${aws_sns_topic.scaling_topic.arn}"
    ]
}

The Lambda function currently only writes the event into a log.

With this setup I generate load on my cluster so that it scales out every 3 minutes. I can validate that this works by looking at the Cloudwatch metrics GroupDesiredCapacity GroupTotalInstances and of course the EC2 instances the AWS console shows me. Indeed, the cluster grows every 3 minutes by 5 instances.

I started out with 5 instances and let the cluster scale 4 times. This means at the end I had a cluster with a total of 25 instances. In my Cloudwatch metrics I can see the GroupDesiredCapacity graph climb by 5 at 1:11pm, 1:14pm, 1:17pm and at 1:20pm, just as expected and in accordance with what I can see on the AWS console.

My problem is, that the Lambda function is triggered only eventually. I get log entries at 1:11pm, 1:36pm, 1:38pm and 1:56pm.

What really confuses me is that the StateChangeTime reported by the alarms are 1:11pm, 1:36pm, 1:38pm and 1:56pm. So it would appear that the Lambda function is indeed triggered as soon as the messages are published.

Where does this mismatch between the triggering of the autoscale policy and the message publication come from? More importantly, how do I align the two?

Upvotes: 2

Views: 393

Answers (1)

user1129682
user1129682

Reputation: 1091

I found a workaround. I am using an autoscaling notification to send other/more notifications:

resource "aws_autoscaling_notification" "scaling_notifications" {
  group_names = [ "${aws_autoscaling_group.ecs.name}" ]
  notifications = [ "autoscaling:EC2_INSTANCE_LAUNCH" ]
  topic_arn = "${aws_sns_topic.scaling_topic.arn}"
}

For some reason this triggers the lambda function when and as often as I want it to. Notice, it is the same SNS topic and the same lambda function as before!

Upvotes: 1

Related Questions