Jeffrey Samuel
Jeffrey Samuel

Reputation: 29

Error in AWS Autoscaling configuration with terraform

I am trying to setup a autoscaling environment with AWS Autoscaling and Launch configuration.

Below is my tfvar for launch Configuration

config_name = "name"
image_id = "ami-test"
instance_type = "c4.large"
key_name = "EC2-key"
security_groups = ["sg-123456789",
    "sg-123456789099"]
associate_public_ip_address = false
enable_monitoring = true
ebs_optimized = true
root_size = 10
root_volume_type = "standard"
root_encrypted = true
device_name = "/dev/sdf"
ebs_volume = 30
ebs_delete = true
ebs_encrypted = true
ebs_volume_type = "gp2"
iam_instance_profile = "arn:aws:iam::1234567890:instance-profile/EC2ROLE"

This is creating a config without any issues and the config created from console and this tfvar execution is almost similar.

Below is the tfvars for autoscaling group.

scaling_name = "EC2-Scaling"
vpc_zone_identifier = ["subnet-123456789", "subnet-asdfghfjk"]
max_size = 2
min_size = 1
health_check_type = "ELB"
launch_configuration = "name"
termination_policies = ["NewestInstance",
    "OldestLaunchConfiguration"]
enabled_metrics = ["GroupInServiceCapacity",
    "GroupMaxSize",
    "GroupTotalCapacity",
    "GroupTotalInstances",
    "GroupMinSize"]
health_check_grace_period = 300
policy_name = "autoscaling_policy"

This is also appearing fine when checking in console. But when the scaling group tries to spin an instance up it is throwing error as below.

Launching a new EC2 instance: i-21358239842. Status Reason: Instance became unhealthy while waiting for instance to be in InService state. Termination Reason: Client.InternalError: Client error on launch

Pls point me to some errors in what I am doing or am I missing something.

As pointed in comment this is the resources class.

resource "aws_launch_configuration" "launch_configuration" {
  name = var.config_name
  image_id = var.image_id
  instance_type = var.instance_type
  key_name = var.key_name
  security_groups = var.security_groups
  associate_public_ip_address = var.associate_public_ip_address
  enable_monitoring = var.enable_monitoring
  ebs_optimized = var.ebs_optimized
  
  root_block_device {
    volume_size = var.root_size
    volume_type = var.root_volume_type
    encrypted = var.root_encrypted
  }
  
  ebs_block_device {
    device_name = var.device_name
    volume_size = var.ebs_volume
    delete_on_termination = var.ebs_delete
    encrypted = var.ebs_encrypted
    volume_type = var.ebs_volume_type
  }
  iam_instance_profile  = var.iam_instance_profile
}


resource "aws_autoscaling_group" "autoscaling" {
  name = var.scaling_name
  vpc_zone_identifier        = var.vpc_zone_identifier  
  max_size = var.max_size
  min_size = var.min_size
  health_check_type = var.health_check_type
  launch_configuration = var.launch_configuration
  termination_policies = var.termination_policies
  enabled_metrics = var.enabled_metrics
  
  instance_refresh {
    strategy = "Rolling"
  }
  
  health_check_grace_period = var.health_check_grace_period
  wait_for_capacity_timeout = 0 ##Skips waiting for capacity and proceeds to create a scaling group
}

resource "aws_autoscaling_policy" "dynamic_scaling" {
  name                   = var.policy_name
  adjustment_type        = "ChangeInCapacity"
  autoscaling_group_name = aws_autoscaling_group.autoscaling.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 40.0
  }
}

Currently I am thinking of solving this in either of the two solutions.

As mentioned by @Arun K Setup the ALB with health check to forward requests to the autoscaling group or ring health check to this

Upvotes: 0

Views: 4213

Answers (2)

Jeffrey Samuel
Jeffrey Samuel

Reputation: 29

Credits to @Arunk Who pointed the Error in the configuration of autoscaling group.

The main cause of error was

resource "aws_autoscaling_group" "autoscaling" {
..
health_check_type = "ELB"
..

I had specified that the Health check was done in Elastic load balancer but I had not assigned the autscaling group to an Load balancer . All I had to do was Create the complete stack below.

resource "aws_lb" "example" {
  load_balancer_type = "gateway"
  name               = "example"

  subnet_mapping {
    subnet_id = aws_subnet.example.id
  }
}

resource "aws_lb_target_group" "example" {
  name     = "example"
  port     = 6081
  protocol = "GENEVE"
  vpc_id   = aws_vpc.example.id

  health_check {
    port     = 80
    protocol = "HTTP"
  }
}

resource "aws_lb_listener" "example" {
  load_balancer_arn = aws_lb.example.id

  default_action {
    target_group_arn = aws_lb_target_group.example.id
    type             = "forward"
  }
}
resource "aws_autoscaling_attachment" "asg_attachment_bar" {
  autoscaling_group_name = aws_autoscaling_group.asg.id
  alb_target_group_arn   = aws_alb_target_group.test.arn
}

Note: Code copied from terraform site.

Once this setup was put in place the error I was getting got resolved.

Upvotes: 0

Paul Stanley
Paul Stanley

Reputation: 4098

From the terraform manual for aws_autoscaling_group:

wait_for_capacity_timeout (Default: "10m") A maximum duration that Terraform should wait for ASG instances to be healthy before timing out. (See also Waiting for Capacity below.) Setting this to "0" causes Terraform to skip all Capacity Waiting behavior.

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group

I think its unhealthy on the basis that it cant communicate yet, judging from the ec2 error. 0 seconds is too short a time for an ec2 instance to go from initialising to inService, the check of which will take place after the "aws_autoscaling_group" resource is fired in terraform. If I were a web user (or health check) hitting the ec2 instance thats currently initialising, I'd get a 500, not a 500-but-ec2-will-be-span-up-soon-try-again-in-a-minute. In resource "aws_autoscaling_group" "autoscaling", try giving it a value:

wait_for_capacity_timeout = 300 

I've set it on the basis of your other value:

health_check_grace_period = 300

So this value means it will wait 300 seconds after ec2 instances have signalled in service before doing a health check.

Upvotes: 1

Related Questions