Thiago Scodeler
Thiago Scodeler

Reputation: 337

AWS ECS agent does not start in EC2 instance

It looks like there might be an issue with the ECS agent on my ECS cluster. For the past two weeks, my ECS cluster with EC2 instances managed by auto scaling (launch templates) and capacity provider has been working fine. However, new instances are not being connected to the ECS cluster because the agent is not starting anymore.

Even when I try to start the ECS agent manually on the instance, it hangs.

The Docker service is running properly, and the proper ECS role is attached to the instance. There are no logs for the agent on the instance.

Here's the ECS service status on a fresh launched EC2 instance:

ecs.service - ECS Agent
   Loaded: loaded (/usr/lib/systemd/system/ecs.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

The AMI I'm using is "amzn2-ami-ecs-hvm-2.0.20240319-x86_64-ebs" with the ID "ami-06ebbcdf40f9949e7." Already tried some new AMI versions but facing the same issue.

Upvotes: 2

Views: 1080

Answers (1)

Andre
Andre

Reputation: 51

I had the same issue, some instances when launched don't run the ecs.agent. Also even if the agent run you need to point it to the cluster, i solved it by running the docker with the ecs agent pointing to the cluster in the user_data (in the launch template)

#!/bin/bash
# Run ECS agent with GPU support
docker run --name ecs-agent \
  --detach=true \
  --restart=on-failure:10 \
  --runtime=nvidia \
  --volume=/var/run/docker.sock:/var/run/docker.sock \
  --volume=/var/log/ecs:/log \
  --volume=/var/lib/ecs/data:/data \
  --net=host \
  --env=ECS_CLUSTER=${aws_ecs_cluster.gpu_cluster.name} \
  --env=ECS_ENABLE_GPU_SUPPORT=true \
  amazon/amazon-ecs-agent:latest
EOF
)

(my image had already docker installed and all the other dependencies, if not you should install them first)

Upvotes: 2

Related Questions