gauthama
gauthama

Reputation: 51

ELB Health Checks Failing with running AWS ECS container

I'm currently trying to deploy an application onto AWS ECS via CloudFormation templates. A docker image is stored in AWS ECR and deployed into an ECS Service fronted by an Application Load Balancer.

My service starts, and my load balancer is created, but the tasks inside of the ECS service repeatedly fail with the error:

Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:us-east-1:...

I've checked my security groups - the ECS Service Security Group includes the Load Balancer Security Group, and the Load Balancer is successfully created.

I've manually tried pulling my image on ECR and running it - no issues there. What am I missing? My template is below.


Resources:
  ECSRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: [ecs.amazonaws.com]
          Action: ['sts:AssumeRole']
      Path: /
      Policies:
      - PolicyName: ecs-service
        PolicyDocument:
          Statement:
          - Effect: Allow
            Action:
              # Rules which allow ECS to attach network interfaces to instances
              # on your behalf in order for awsvpc networking mode to work right
              - 'ec2:AttachNetworkInterface'
              - 'ec2:CreateNetworkInterface'
              - 'ec2:CreateNetworkInterfacePermission'
              - 'ec2:DeleteNetworkInterface'
              - 'ec2:DeleteNetworkInterfacePermission'
              - 'ec2:Describe*'
              - 'ec2:DetachNetworkInterface'

              # Rules which allow ECS to update load balancers on your behalf
              # with the information sabout how to send traffic to your containers
              - 'elasticloadbalancing:DeregisterInstancesFromLoadBalancer'
              - 'elasticloadbalancing:DeregisterTargets'
              - 'elasticloadbalancing:Describe*'
              - 'elasticloadbalancing:RegisterInstancesWithLoadBalancer'
              - 'elasticloadbalancing:RegisterTargets'
            Resource: '*'

  # This is a role which is used by the ECS tasks themselves.
  ECSTaskExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: [ecs-tasks.amazonaws.com]
          Action: ['sts:AssumeRole']
      Path: /
      Policies:
        - PolicyName: AmazonECSTaskExecutionRolePolicy
          PolicyDocument:
            Statement:
            - Effect: Allow
              Action:
                # Allow the ECS Tasks to download images from ECR
                - 'ecr:GetAuthorizationToken'
                - 'ecr:BatchCheckLayerAvailability'
                - 'ecr:GetDownloadUrlForLayer'
                - 'ecr:BatchGetImage'

                # Allow the ECS tasks to upload logs to CloudWatch
                - 'logs:CreateLogStream'
                - 'logs:PutLogEvents'
              Resource: '*'
  TaskDef:
    Type: AWS::ECS::TaskDefinition
    Properties: 
      Cpu: 4096    
      Memory: 30720
      ContainerDefinitions: 
        - Image: !Ref ECRImageUrl
          Name: !Sub "${ProjectName}-ecsContainer"
          PortMappings: 
            - ContainerPort: 4000
              HostPort: 4000
              Protocol: tcp
      Family: !Sub "${ProjectName}-taskDef"
      ExecutionRoleArn: !Ref ECSTaskExecutionRole
      RequiresCompatibilities: 
        - FARGATE
      NetworkMode: awsvpc

  Cluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: !Sub "${ProjectName}-ECSCluster"

  Service:
    Type: AWS::ECS::Service
    DependsOn:
      - LoadBalancerListener
    Properties:
      Cluster: !Ref Cluster
      DesiredCount: 2
      LaunchType: FARGATE
      ServiceName: !Sub "${ProjectName}-ECSService"
      TaskDefinition: !Ref TaskDef
      NetworkConfiguration:
        AwsvpcConfiguration:
          SecurityGroups: 
            - !Ref FargateContainerSecurityGroup
          AssignPublicIp: ENABLED
          Subnets: !Split [',', {'Fn::ImportValue': !Sub '${VPCStackName}-PublicSubnets'}]
      LoadBalancers:
        - ContainerName: !Sub "${ProjectName}-ecsContainer"
          ContainerPort: 4000
          TargetGroupArn: !Ref TargetGroup

  FargateContainerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the Fargate containers
      VpcId:
        Fn::ImportValue: 
          !Sub '${VPCStackName}-VPC'
  EcsSecurityGroupIngressFromPublicALB:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      Description: Ingress from the public ALB
      GroupId: !Ref 'FargateContainerSecurityGroup'
      IpProtocol: -1
      SourceSecurityGroupId: !Ref 'PublicLoadBalancerSG'
  EcsSecurityGroupIngressFromSelf:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      Description: Ingress from other containers in the same security group
      GroupId: !Ref 'FargateContainerSecurityGroup'
      IpProtocol: -1
      SourceSecurityGroupId: !Ref 'FargateContainerSecurityGroup'
  PublicLoadBalancerSG:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the public facing load balancer
      VpcId: 
        Fn::ImportValue: 
          !Sub '${VPCStackName}-VPC'
      SecurityGroupIngress:
          - CidrIp: 0.0.0.0/0
            IpProtocol: -1

  ACMCertificate:
    Type: AWS::CertificateManager::Certificate
    Properties: 
      DomainName: !Sub ${ProjectName}.${DomainName}
      ValidationMethod: DNS

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    DependsOn:
      - LoadBalancer
    Properties:
      TargetType: ip
      Name: !Sub "${ProjectName}-ECSService"
      Port: 4000
      Protocol: HTTP
      VpcId: 
        Fn::ImportValue: 
          !Sub '${VPCStackName}-VPC'

  LoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Scheme: internet-facing
      Subnets: !Split [',', {'Fn::ImportValue': !Sub '${VPCStackName}-PublicSubnets'}]
      SecurityGroups: 
        - !Ref PublicLoadBalancerSG

  LoadBalancerListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    DependsOn:
      - LoadBalancer
    Properties:
      DefaultActions:
        - TargetGroupArn: !Ref TargetGroup
          Type: 'forward'
      LoadBalancerArn: !Ref LoadBalancer
      Port: 443
      Protocol: HTTP

Upvotes: 0

Views: 13440

Answers (3)

Achintya Ashok
Achintya Ashok

Reputation: 539

After much pain and suffering, I found out that the ALB itself needs to associated with the security group (SG) that allows traffic on the ports that get dynamically allocated by ECS. You should have an SG defined automatically that defines those port ranges. Associate this SG with your ALB and your health checks will start passing (assuming everything else is hooked up correctly).

In addition, ensure that your task definition has network mode set to "bridge" & that the "hostPort" value is set to 0 -- this indicates to ECS to dynamically allocate a port on the underlying EC2 instance and map it to your container port.

Upvotes: 1

gauthama
gauthama

Reputation: 51

It turns out that my security groups were not permissive enough. Traffic coming in from a Network Load Balancer is seen as coming from its original source, so if your NLB is open to all traffic, so should your Fargate containers. This fixed my issue:


  FargateContainerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the Fargate containers
      VpcId:
        Fn::ImportValue: 
          !Sub '${VPCStackName}-VPC'
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: !Ref ApplicationPort
          ToPort: !Ref ApplicationPort
          CidrIp: 0.0.0.0/0

Upvotes: 4

Prashant Lakhlani
Prashant Lakhlani

Reputation: 5806

Health check feature automatically calls / at port 80 and expect 200 status code in response. Its available in EC2->target groups -> your ecs target group. You have to ensure you port is 4000 and in health check adjust default path and response status code.

Additionally you can always try connecting to your ec2 instance using public ip or DNS on port 4000 that you are using and see if that works.

If ec2 instance not working at port 4000 troubleshoot the docker deployment. There is something wrong with talks definition or parameters.

If it's working something wrong with target group trlargets or health check configuration.

Hope this helps.

Upvotes: 1

Related Questions