How can I find out why my health checks are failing?

Question

My instances keep failing their ELB health checks and I can't find any information on why that's happening. I go to the target group in the console and under 'targets' the only information I get is that the health check status is 'unhealthy' and the 'health status details' just say 'health checks failed'. How can I find the real reason my health checks are failing? Here's my Terraform code as well that includes my load balancer, auto scaling group, listener and target group

main.tf

resource "aws_lb" "jira-alb" {
  name               = "jira-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.jira_clb_sg.id]
  subnets            = [var.public_subnet_ids[0], var.public_subnet_ids[1]]

  enable_deletion_protection = false

  access_logs {
    bucket   = aws_s3_bucket.this.id
    enabled  = true
  }

  tags = {
    Environment = "production"
  }



}

resource "aws_lb_target_group" "jira" {
  name     = "jira-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id

  health_check {
    enabled = true
    healthy_threshold = 10
    unhealthy_threshold = 5
    interval = 30
    timeout = 5
    path = "/index.html"
  }

stickiness {
  type = "lb_cookie"
  cookie_duration = 1 ## CANT BE 0.. RANGES FROM 1-604800
}
}

resource "aws_lb_listener" "jira-listener" {

  port            = 443
  protocol        = "HTTPS"
  ssl_policy      = "ELBSecurityPolicy-TLS-1-2-2017-01"
  load_balancer_arn = aws_lb.jira-alb.arn
  certificate_arn = data.aws_acm_certificate.this.arn ##TODO Change to a variable

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.jira.arn
  }

}

resource "aws_autoscaling_group" "this" {
  vpc_zone_identifier       = var.subnet_ids
  health_check_grace_period = 300
  health_check_type         = "ELB"
  force_delete              = true
  desired_capacity          = 2
  max_size                  = 2
  min_size                  = 2
  target_group_arns = [aws_lb_target_group.jira.arn]


  timeouts {
    delete = "15m"
  }


  launch_template {
    id      = aws_launch_template.this.id
    # version = "$Latest"
    version = aws_launch_template.this.latest_version
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
  }
}

I was expecting my health checks to pass and my instances to stay running, but they keep failing and getting re-deployed

Also here are the security groups for my load balancer and my auto-scaling group

security_groups.tf

resource "aws_security_group" "jira_clb_sg" {
  description = "Allow-Veracode-approved-IPs from external to elb"
  vpc_id      = var.vpc_id

  tags = {
    Name      = "public-elb-sg-for-jira"
    Project   = "Jira Module"
    ManagedBy = "terraform"
  }

  ingress {

    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = var.veracode_ips

  }

  egress {

    from_port   = 0
    to_port     = 0
    protocol    = -1
    cidr_blocks = ["0.0.0.0/0"]

  }

}

resource "aws_security_group" "jira_sg" {
  description = "Allow-Traffic-From-CLB"
  vpc_id      = var.vpc_id

  tags = {
    Name      = "allow-jira-public-clb-sg"
    Project   = "Jira Module"
    ManagedBy = "terraform"
  }

  ingress {

    from_port       = 0
    to_port         = 0
    protocol        = -1
    security_groups = [aws_security_group.jira_clb_sg.id]

  }

  egress {

    from_port   = 0
    to_port     = 0
    protocol    = -1
    cidr_blocks = ["0.0.0.0/0"]

  }


}

My load balancer lets in traffic from port 443 and my auto scaling group allows traffic on any port from the load balancer security group

How can I find out why my health checks are failing?

Answers (1)

Related Questions