Reputation: 699
I have a step function that orchestrates a combination of lambdas and fargate tasks. This step function has a timeout set to 24 hours. However, this timeout does not propagate to canceling the already running fargate task.
I have looked at this previously to send success and fail messages back to the step function, which works great. I can't figure out how to solve the time out issue though.
Current Config looks like this:
{
"StartAt": "Fargate-task",
"States": {
"Fargate-task": {
"Next": "pass-state",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "fail-state"
}
],
"Type": "Task",
"TimeoutSeconds": 60,
"ResultPath": "$.extractor_output",
"Resource": "arn:aws:states:::ecs:runTask.waitForTaskToken"
},
"pass-state": {
"Type": "Pass",
"Next": "Lambda Worker"
},
"Lambda Worker": {
"Type": "Map",
"End": true,
"Iterator": {
"StartAt": "LambdaWorker",
"States": {
"LambdaWorker": {
"End": true,
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "<LAMBDA>",
"Payload.$": "$"
}
}
}
},
"ItemsPath": "$.extractor_output",
"MaxConcurrency": 10
},
"fail-state": {
"Type": "Fail"
}
}
}
Upvotes: 2
Views: 1370
Reputation: 779
One way to handle it would be to catch the timeout error and issue a command to kill the fargate task :
Like in this example from docs https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html :
{
"Comment": "A Hello World example of the Amazon States Language using an AWS Lambda function",
"StartAt": "HelloWorld",
"States": {
"HelloWorld": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:sleep10",
"TimeoutSeconds": 2,
"Catch": [ {
"ErrorEquals": ["States.Timeout"],
"Next": "fallback"
} ],
"End": true
},
"fallback": {
"Type": "Pass",
"Result": "Hello, AWS Step Functions!",
"End": true
}
}
}
Upvotes: 1