Reputation: 145
I am trying to use a custom AMI in AWS Batch. The AMI has already been configured to be Batch-compatible, but the ECS container won't start. When I try to include the AMI in a Batch job, the job gets stuck under "Runnable". When I log into my container, and view the /var/log/ecs-agent.log, I see the message below. This is my first time trying a custom AMI in Batch, so I'm really not sure where the error is coming from and haven't been able to find any answers online.
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Loading configuration" module=agent.go
level=warn time=2021-08-05T20:35:31Z msg="Unable to fetch user data: EC2MetadataError: failed to make EC2Metadata request\n\tstatus code: 404, request id: \ncaused by: <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n <title>404 - Not Found</title>\n </head>\n <body>\n <h1>404 - Not Found</h1>\n </body>\n</html>\n" module=config.go
level=info time=2021-08-05T20:35:31Z msg="Amazon ECS agent Version: 1.54.1, Commit: 3e20420f" module=agent.go
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider" module=instancecreds.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-pause:0.1.0" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Image excluded from cleanup: amazon/amazon-ecs-agent:latest" module=docker_image_manager.go
level=info time=2021-08-05T20:35:31Z msg="Creating root ecs cgroup: /ecs" module=init_linux.go
level=info time=2021-08-05T20:35:31Z msg="Creating cgroup /ecs" module=cgroup_controller_linux.go
level=warn time=2021-08-05T20:35:31Z msg="Disabling TaskCPUMemLimit because agent is unabled to setup '/ecs' cgroup: cgroup create: unable to create controller: mkdir /sys/fs/cgroup/systemd/ecs: read-only file system" module=agent_unix.go
level=info time=2021-08-05T20:35:31Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2021-08-05T20:35:31Z msg="Loading state!" module=state_manager.go
level=info time=2021-08-05T20:35:32Z msg="Registering Instance with ECS" module=agent.go
level=info time=2021-08-05T20:35:32Z msg="Remaining mem: 7455" module=client.go
level=error time=2021-08-05T20:35:52Z msg="Unable to register as a container instance with ECS: RequestError: send request failed\ncaused by: Post \"https://ecs.us-east-1.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" module=client.go
level=error time=2021-08-05T20:35:52Z msg="Error registering: RequestError: send request failed\ncaused by: Post \"https://ecs.us-east-1.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" module=agent.go
Upvotes: 0
Views: 2320
Reputation: 145
Resolved: The ECS Agent was not installed properly in my custom AMI.
The ultimate solution to running my custom AMI in Batch was to create a Launch Template with the following script in the User Data section. This runs the Batch-compatibility set up upon starting. The Launch Template can then be specified in the Compute Environment in Batch.
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
sudo apt-get install iptables-persistent
sudo iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
sudo iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679
mkdir -p /etc/ecs && sudo touch /etc/ecs/ecs.config
mkdir -p /var/log/ecs /var/lib/ecs/data
cat <<EOF >>/etc/ecs/ecs.config
ECS_DATADIR=/data
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true
ECS_LOGFILE=/log/ecs-agent.log
ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","awslogs"]
ECS_LOGLEVEL=info
ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true
EOF
cat <<EOF >>/etc/systemd/system/ecs-agent.service
[Unit]
Description=AWS ECS Agent
Requires=docker.service
After=docker.service
[Service]
TimeoutStartSec=0
RestartSec=10
Restart=always
KillMode=none
ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent:latest
ExecStart=/usr/bin/docker run --name %n \
--restart=on-failure:10 \
--volume=/var/run/docker.sock:/var/run/docker.sock \
--volume=/var/log/ecs:/log \
--volume=/var/lib/ecs/data:/data \
--net=host \
--env-file=/etc/ecs/ecs.config \
--env=ECS_LOGFILE=/log/ecs-agent.log \
--env=ECS_DATADIR=/data/ \
--env=ECS_ENABLE_TASK_IAM_ROLE=true \
--env=ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
--env=ECS_IMAGE_CLEANUP_INTERVAL=10m \
--env=ECS_IMAGE_MINIMUM_CLEANUP_AGE=20m \
--env=ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=1h \
--env=ECS_NUM_IMAGES_DELETE_PER_CYCLE=10 \
amazon/amazon-ecs-agent:latest
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now --no-block ecs-agent.service
--==MYBOUNDARY==--
Upvotes: 1