Austin K
Austin K

Reputation: 549

STOPPED (CannotPullContainerError: API error (500)?

I'm getting this error when running a task on my Amazon Fargate cluster. Has anyone seen run into this before?

Upvotes: 17

Views: 9430

Answers (7)

manish
manish

Reputation: 974

Assigning a Public IP is mandatory for Fargate. For details see https://github.com/aws/amazon-ecs-agent/issues/1128

Upvotes: 5

enisdenjo
enisdenjo

Reputation: 732

Public IP is not mandatory, the specification for creating a working NAT Gateway is lacking. At the GitHub issue Amazon technicians keep repeating you "just" need Private IP + NAT, however this is not true. I struggled with this myself a lot, but finally got it working properly without using a Public IP for my Fargate services.

To have Fargate services access internet without having a Public IP you need to set up a VPC which has 2 subnets:

  • A public subnet with an Internet Gateway allowing bidirectional internet access
  • A private subnet with a NAT Gateway allowing only outgoing internet access

You can create such a VPC in 2 ways: by going to Services > VPC > VPC Dashboard, clicking on Launch VPC Wizard and selecting "VPC with Public and Private Subnets"; or manually:

NOTE: All of the following steps are performed in Services > VPC

  1. Go to Your VPCs and Create a VPC
  2. Go to Subnets and Create subnet 2 times
    1. private subnet
      1. Attach it to the VPC in focus. Whatever CIDR block, whatever availability zone you like
    2. public subnet
      1. Attach it to the VPC in focus. Whatever CIDR block, whatever availability zone you like
  3. Go to Internet Gateways and Create internet gateway
    1. Name it however you want
    2. Select the newly created Internet Gateway, Actions, Attach to VPC and attach it to the VPC in focus
  4. Go to NAT Gateways and Create NAT Gateway
    1. Important: Select the public subnet
    2. Create New EIP or use an existing one given that you have one
    3. Wait for the gateway to become Available
  5. Go to Route Tables and Create route table 2 times
    1. private route table
      1. Attach it to the VPC in focus
      2. Back at the list, select the route table
      3. Routes tab on the bottom, Edit routes
      4. Add route, destination: 0.0.0.0/0, target the NAT Gateway created previously and Save routes
      5. Still having the route table selected, Actions and Set Main Route Table (if not already)
    2. public route table
      1. Attach it to the VPC in focus
      2. Back at the list, select the route table
      3. Routes tab on the bottom, Edit routes
      4. Add route, destination: 0.0.0.0/0, target the Internet Gateway created previously and Save routes
      5. Subnet Associations tab on the bottom, Edit subnet associations
      6. Select the public subnet, Save
  6. Put cucumber on eyes.

Every service you put in the public subnet will have bidirectional internet access and every service you put in the private subnet will have only outgoing internet access (yes, Fargate and EC2 services in the private subnet without Public IPs will have internet access).

Upvotes: 5

Shoan
Shoan

Reputation: 4078

Make sure that your subnet has access to the internet. In my case, the fargate task was deployed to a private subnet. While this subnet had the nat gateway configured, the public subnet, did not have a route to the internet gateway.

Upvotes: 0

nicolasochem
nicolasochem

Reputation: 467

If you are running ECS in a private VPC without Internet access, set up a VPC endpoint for ECR and S3 first.

Upvotes: 0

Tim Klein
Tim Klein

Reputation: 2768

Go to the docs for an answer to this one.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_cannot_pull_image.html

Since you are encountering a 500 error, I would heed the advice of the first error's description, "Connection timed out":

When a Fargate task is launched, its elastic network interface requires a route to the internet to pull container images. If you receive an error similar to the following when launching a task, it is because a route to the internet does not exist:

CannotPullContainerError: API error (500): Get https://111122223333.dkr.ecr.us-east-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection

To resolve this issue, you can:

  • For tasks in public subnets, specify ENABLED for Auto-assign public IP when launching the task...

  • For tasks in private subnets, specify DISABLED for Auto-assign public IP when launching the task, and configure a NAT Gateway in your VPC to route requests to the internet...

If you encountering any other issues relating to ECS Tasks not starting or exhibiting weird behavior upon starting, then check the full list of ECS troubleshooting topics.

I was encountering a similar error (404 instead of 500), however, the Task displayed that it was RUNNING even though the detailed status listed an error.

It turns out that the role associated with the task (same role as the EC2 Instance on which it was running, in this case) could not be assumed by ecs-tasks. Adding the following trust relationship statement to the role resolved the issue:

{
  "Effect": "Allow",
  "Principal": {
    "Service": "ecs-tasks.amazonaws.com"
  },
  "Action": "sts:AssumeRole"
}

See the specific page on the Task Execution Roles for more details.

Upvotes: 6

Glabler
Glabler

Reputation: 255

You have to allocate a Public Ip to your service, you can do it during the Service definition but as far as I know you can not update your service from the update menu.

Upvotes: 0

oskarpearson
oskarpearson

Reputation: 284

This error occurs when the container is unable to pull the container from the registry.

  1. Check that you're allocating a public IP address to your containers. Currently the AWS container registry doesn't have an internal-in-vpn endpoint.
  2. Check that your containers have a way to connect to the internet (eg: nat instance or similar.
  3. Check that the security group that you have associated with the container allows outbound traffic. If you created the SG with terraform or similar you may find that it's defaulting to having no outbound rules.

Upvotes: 2

Related Questions