Diego Marquez
Diego Marquez

Reputation: 492

All tasks on an ECS service stuck in PROVISIONING state

I'm trying to set up a service that launches 20 single-container tasks with an application load balancer. The problem is that every task stays stuck on PROVISIONING

The service has logged an error saying that "service service_name is unable to consistently start tasks successfully", which does not seem very helpful since the documentation basically explains that the task launch failed and there were many retries.

My cluster is using an AutoScaleGroup capacity provider whose launch template is using an ecs-enabled AMI, with a role that has AmazonEC2ContainerServiceforEC2Role policy attached to it. The instance type is t2.micro (tried t2.small with no results)

Can anybody help me troubleshoot this situation? May the task definition be a cause for this? Thanks in advance

Upvotes: 6

Views: 33751

Answers (6)

Anirudh
Anirudh

Reputation: 19

I faced the same problem, here are them in detail & their fixes that worked for me.

  1. Task not provisioning in ECS using EC2 custom ASG cluster. worked fine in FARGATE.
  • solution:
  1. check if the required cpu & mem specified in task definition is < (less) than the instance specified to run, if not fix it.
  2. Network settings to run a docker cotainer is default. This is specified while declaring task definition, In my case both awsvpc & bridge didnt let my instance launch but default worked fine, Even though below the hood default is bridge but somehow shi doesnt work.
  3. Also uncheck AWS zone rebalancing just in case.

Upvotes: 0

Mike McCartin
Mike McCartin

Reputation: 179

For the benefit of others that might come across this question searching for why their ECS task/service might be stuck on PROVISIONING with CloudFormation on CREATE_IN_PROGRESS:

Make sure you haven't re-used a launch template from a different ECS cluster. As the other answers point out: there is a bash script in the Advanced section at the bottom of the template creation page that contains information specific to one cluster. You can re-use it, but you must change the cluster name. You will know if you have it set up correctly if the instance appears in the Infrastructure tab in the cluster on ECS.

Update:

Returning to this answer with another possible reason: If your container wants to start in an availability zone without capacity (in my case I wanted us-east-1a but my single test instance was in us-east-1f), the task will remain on PROVISIONING. The instance was in an ASG with available capacity and it didn't automatically add an instance in that AZ. After starting 5 new instances (one of them ended up in the correct zone), the task moved from PROVISIONING to RUNNING on its own.

Update 11/24:

(I'm writing this in retrospect after having my attention called back to this answer by an upvote, so I don't recall if this caused the PROVISIONING issue, but it is closely related.)

Another thing that could potentially cause this that is related to the first update:

If you are able to start one task on a given instance in your cluster, and you have enough CPU and memory available for the additional task, you may be limited by the number of Elastic Network Interfaces (ENIs) allowed to be attached to that instance. You have to enable ENI trunking. Similar to the above, I believe the only symptom of this is tasks stuck in the PROVISIONING state.

Upvotes: 1

Rabin bhandari
Rabin bhandari

Reputation: 11

In my situation, I revised the user data for the EC2 instance:

[settings.ecs] cluster = "YourClusterName"

Following this update, the cluster indicated one registered container instance in the Cluster overview. Subsequently, I resolved the problem.

Upvotes: 1

RagazziD
RagazziD

Reputation: 79

So, you need an IAM role attached policy ecsInstanceRole to the LC (Launch Configuration) to register the instance to the ecs cluster AND set the userdata to:

#!/bin/bash
echo ECS_CLUSTER=YOU_CLUSTER_NAME_HERE >> /etc/ecs/ecs.config

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html

Upvotes: 7

Sidnei Bernardo
Sidnei Bernardo

Reputation: 353

In my case the task was stuck in PROVISIONING state because the task definition required 16GB memory, but the AutoScaleGruping EC2 instances had only 15GB available. I changed the memory in task definition to 15GB and I then I could start a task.

Upvotes: 8

Diego Marquez
Diego Marquez

Reputation: 492

At the end, I realized that each task must have an HTTP endpoint called /health that should return 200. By calling that endpoint, the load balancer determines whether the container is PROVISIONING or READY

Upvotes: 11

Related Questions