Reputation: 5016
I am using Terraform to set up a small Fargate cluster of three apache server tasks. The tasks hang on pending, and then the cluster stops them and creates new pending tasks, and the cycle continues.
The AWS docs say it could be because:
- The Docker daemon is unresponsive
The docs say to setup CloudWatch to see CPU usage and increase container size if needed. I have upped both the CPU/memory to 1024/2048, which didn't fix the problem.
- The Docker image is large
Unlikely? The image is nothing but httpd:2.4
- The ECS container agent lost connectivity with the Amazon ECS service in the middle of a task launch
The docs provide some commands to run in the container instance. To do this it looks like I have to either set up AWS Systems Manager or SSH in directly. I will take this route if I can't find any problems with my Terraform config.
- The ECS container agent takes a long time to stop an existing task
Unlikely because I am launching a completely new ECS cluster
Below are the ECS and IAM sections of my Terraform file. Why might my Fargate tasks be stuck on pending?
#
# ECS
#
resource "aws_ecs_cluster" "main" {
name = "main-ecs-cluster"
}
resource "aws_ecs_task_definition" "app" {
family = "app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = 256
memory = 512
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.task_execution.arn
container_definitions = <<DEFINITION
[
{
"image": "httpd:2.4",
"cpu": 256,
"memory": 512,
"name": "app",
"networkMode": "awsvpc",
"portMappings": [
{
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp"
}
]
}
]
DEFINITION
}
resource "aws_ecs_service" "main" {
name = "tf-ecs-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 2
launch_type = "FARGATE"
network_configuration {
security_groups = [aws_security_group.main.id]
subnets = [
aws_subnet.public1.id,
aws_subnet.public2.id,
]
}
}
#
# IAM
#
resource "aws_iam_role" "task_execution" {
name = "my-first-service-task-execution-role"
assume_role_policy = data.aws_iam_policy_document.task_execution.json
}
data "aws_iam_policy_document" "task_execution" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy_attachment" "task_execution" {
role = aws_iam_role.task_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
Upvotes: 3
Views: 8597
Reputation: 681
Public subnet / public IP may not be correct solution for many security reasons.
Consider placing your tasks in private subnets.
or you can use BETTER solution:
Upvotes: 5
Reputation: 238957
Based on the discussion in the comments it was determined that the issue is caused by the lack of internet access for the Fargate tasks.
This is because the tasks run in a private subnet, while task use httpd
image from docker hub. Pulling images from the hub requires internet access.
Possible solutions are use of NAT gateway/instance, using tasks in the public subnet or having custom image in ECR..
Upvotes: 6