vmax
vmax

Reputation: 41

ECS fargete TASK fails when service is created and after some tasks getting failed task is running and reaches a stady state

Im using ECS FARGATE, ALB, Target Group with instyance ip. when ever my service creates a task it is getting the following error and task is getting failed. New task is being created after this issue haappens at some point of time task is working and reaching steady state. refer attached images and taskdef

grtting this error:

Task stopped at: 2023-12-15T06:06:25.165Z ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.ap-south-1.amazonaws.com/": dial tcp 13.234.9.92:443: i/o timeout. Please check your task network configuration.

similarly when i create service with 2 desired tasks one will run fine second task is getting the same above issue and when i specify 3 desired tasks 2 will be running fine third one gets the issue (same error mentioned above)

my task def looks like this:

-

{
       "taskDefinitionArn": "arn:aws:ecs:ap-south-1:6xxxx5:task-definition/qrc-bg-test-v1-TaskDefinition:2",
       "containerDefinitions": [
           {
               "name": "qrc-bg-test-v1-Container",
               "image": "6xxx5.dkr.ecr.ap-south-1.amazonaws.com/qrc:latest",
               "cpu": 0,
               "links": [],
               "portMappings": [
                   {
                       "containerPort": 5000,
                       "hostPort": 5000,
                       "protocol": "tcp"
                   }
               ],
               "essential": true,
               "entryPoint": [],
               "command": [],
               "environment": [],
               "environmentFiles": [],
               "mountPoints": [],
               "volumesFrom": [],
               "secrets": [],
               "dnsServers": [],
               "dnsSearchDomains": [],
               "extraHosts": [],
               "dockerSecurityOptions": [],
               "dockerLabels": {},
               "ulimits": [],
               "logConfiguration": {
                   "logDriver": "awslogs",
                   "options": {
                       "awslogs-create-group": "true",
                       "awslogs-group": "/ecs/qrc-bg-test-v1TaskDefinition",
                       "awslogs-region": "ap-south-1",
                       "awslogs-stream-prefix": "ecs"
                   },
                   "secretOptions": []
               },
               "systemControls": []
           }
       ],
       "family": "qrc-bg-test-v1-TaskDefinition",
       "taskRoleArn": "arn:aws:iam::6xxxx5:role/ecsTaskExecutionRole",
       "executionRoleArn": "arn:aws:iam::6xxx5:role/ecsTaskExecutionRole",
       "networkMode": "awsvpc",
       "revision": 2,
       "volumes": [],
       "status": "ACTIVE",
       "requiresAttributes": [
           {
               "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
           },
           {
               "name": "ecs.capability.execution-role-awslogs"
           },
           {
               "name": "com.amazonaws.ecs.capability.ecr-auth"
           },
           {
               "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
           },
           {
               "name": "com.amazonaws.ecs.capability.docker-remote-api.1.17"
           },
           {
               "name": "com.amazonaws.ecs.capability.task-iam-role"
           },
           {
               "name": "ecs.capability.execution-role-ecr-pull"
           },
           {
               "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
           },
           {
               "name": "ecs.capability.task-eni"
           },
           {
               "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
           }
       ],
       "placementConstraints": [],
       "compatibilities": [
           "EC2",
           "FARGATE"
       ],
       "requiresCompatibilities": [
           "FARGATE"
       ],
       "cpu": "1024",
       "memory": "3072",
       "registeredAt": "2023-12-15T06:00:50.419Z",
       "registeredBy": "arn:aws:sts::6xxxxx5:assumed-role/B_Role",
       "tags": [] }

[enter image description here](https://i.sstatic.net/oYlDX.jpg)

Upvotes: 0

Views: 153

Answers (1)

Happy Cappy
Happy Cappy

Reputation: 38

Since it's just some of the tasks that fail and others seems to work (with the same config). Please check if you're starting the tasks in multiple subnets, it might be that some tasks are started in subnets that cannot reach the ecr-api.

For example if the tasks are starting in: Private subnets without access to the internet, or access to a VPC-endpoint for that API. (Seems to be trying through the internet right now based on the IP in the error message)

And once in a while you get lucky and a task starts in a subnet that has the necessary access.

Upvotes: 1

Related Questions