Frankster
Frankster

Reputation: 699

Stepfunction timeout does not stop Fargate task

I have a step function that orchestrates a combination of lambdas and fargate tasks. This step function has a timeout set to 24 hours. However, this timeout does not propagate to canceling the already running fargate task.

I have looked at this previously to send success and fail messages back to the step function, which works great. I can't figure out how to solve the time out issue though.

Current Config looks like this:

    {
    "StartAt": "Fargate-task",
    "States": {
    "Fargate-task": {
        "Next": "pass-state",
        "Catch": [
        {
            "ErrorEquals": [
            "States.ALL"
            ],
            "Next": "fail-state"
        }
        ],
        "Type": "Task",
        "TimeoutSeconds": 60,
        "ResultPath": "$.extractor_output",
        "Resource": "arn:aws:states:::ecs:runTask.waitForTaskToken"
    },
    "pass-state": {
        "Type": "Pass",
        "Next": "Lambda Worker"
    },
    "Lambda Worker": {
        "Type": "Map",
        "End": true,
        "Iterator": {
        "StartAt": "LambdaWorker",
        "States": {
            "LambdaWorker": {
            "End": true,
            "Retry": [
                {
                "ErrorEquals": [
                    "Lambda.ServiceException",
                    "Lambda.AWSLambdaException",
                    "Lambda.SdkClientException"
                ],
                "IntervalSeconds": 2,
                "MaxAttempts": 6,
                "BackoffRate": 2
                }
            ],
            "Type": "Task",
            "Resource": "arn:aws:states:::lambda:invoke",
            "Parameters": {
                "FunctionName": "<LAMBDA>",
                "Payload.$": "$"
            }
            }
        }
        },
        "ItemsPath": "$.extractor_output",
        "MaxConcurrency": 10
    },
    "fail-state": {
        "Type": "Fail"
    }
    }
}

Upvotes: 2

Views: 1370

Answers (1)

Anis Smail
Anis Smail

Reputation: 779

One way to handle it would be to catch the timeout error and issue a command to kill the fargate task :

Like in this example from docs https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html :

{
   "Comment": "A Hello World example of the Amazon States Language using an AWS Lambda function",
   "StartAt": "HelloWorld",
   "States": {
      "HelloWorld": {
         "Type": "Task",
         "Resource": "arn:aws:lambda:us-east-1:123456789012:function:sleep10",
         "TimeoutSeconds": 2,
         "Catch": [ {
            "ErrorEquals": ["States.Timeout"],
            "Next": "fallback"
         } ],
         "End": true
      },
      "fallback": {
         "Type": "Pass",
         "Result": "Hello, AWS Step Functions!",
         "End": true
      }
   }
}

Upvotes: 1

Related Questions