Reputation: 11931
I have a state machine that consists of a Map task that starts a lot of Fargate tasks (30+) a very similar task definition. The only differences between the tasks are the environment variables in the ContainerOverrides
block.
Task Definition:
"CalculateTask": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 10,
"MaxAttempts": 2,
"BackoffRate": 1.5
}
],
"Parameters": {
"LaunchType": "FARGATE",
"Cluster": "arn:aws:ecs:region:111111111:cluster/cluster-name",
"TaskDefinition": "arn:aws:ecs:region:111111111:task-definition/task-definition:44",
"NetworkConfiguration": {
"AwsvpcConfiguration": {
"Subnets": [
"subnet-1111111111111111","subnet-2222222222222222","subnet-3333333333333333"
],
...
}
},
"Overrides": {
"ContainerOverrides": [
{
"Name": "Phase-1-start",
"Environment": [
{
"Name": "COMMAND",
"Value": "calculateGas/Oil/PeakGas..."
}
]
}
]
}
}
}
When I run my State Machibe tasks keep failing with this StoppedReason
:
"StopCode": "TaskFailedToStart",
"StoppedAt": 1618584363236,
"StoppedReason": "Unexpected EC2 error while attempting to Create Network Interface with public IP assignment
enabled in subnet 'subnet-2222222222222222': InsufficientFreeAddressesInSubnet",
I don't understand why this issue occurs, I am supplying 3 subnet ids for ECS to choose from.
Upvotes: 2
Views: 6569
Reputation: 33
I had the same exact issue. The root cause ended up being that Fargate tasks I started with run_task
were, for some reason, not properly terminating. They were ending up in an "INACTIVE" state and hanging around for months. The fact that they weren't properly terminating meant that they weren't releasing their IP addresses in the subnet. This meant that new tasks weren't able to get an IP and would fail.
To fix, I had to:
Clusters
pageRunning Tasks
)Tasks
tab[INACTIVE]
instancesStop
to stop the tasksIn addition to cleaning up these inactive instances, I added some extra code/alarming to make sure that this issue wouldn't go undetected:
def invoke_fargate(cw_metrics, YOUR_ARGS_HERE)
client = boto3.client("ecs", region_name=get_aws_region())
response = client.run_task(YOUR_CODE_HERE)
# Honestly not sure if this is required...better safe than sorry?
_LOGGER.info("Starting to sleep to allow `run_task` chance to kick of container")
time.sleep(30)
task_arn = response["tasks"][0]["taskArn"]
description = client.describe_tasks(cluster=cluster_name, tasks=[task_arn])
_LOGGER.info("%s", description)
for status_dict in description["tasks"]:
if status_dict.get("stopCode") in ["TaskFailedToStart"]:
cw_metrics.trigger_alarm("FARGATE_INVOCATION_FAILED")
_LOGGER.info("Done with Fargate invocation")
Upvotes: 1