AWS_Beginner
AWS_Beginner

Reputation: 476

EMR cluster not terminating on failure from stepfunction

We are creating EMR cluster from stepfunction.Below are steps it performs

  1. Create Cluster
  2. preprocessing (like installing scripts etc)
  3. submit job
  4. terminate cluster

Issue here is when the #3 step fails (submit job) it does not terminate the cluster when job fails. When job success it terminates cluster. Below is definition of step (Ignore syntax)

"Submit_Job": {
        "Type": "Task",
        "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync"
        "Parameters": {
            "ClusterId.$": "$.FormattedInputsForEmr.ClusterId",
            "Step": {
                "Name": "Start Job",
                "ActionOnFailure": "CONTINUE",
                "HadoopJarStep": {
                    "Jar": "command-runner.jar",
                    "Args.$": "$.FormattedInputsForEmr.Args[*][*]"
                }
            }
        }

Now when i try ActionOnFailure=TERMINATE_CLUSTER it gives me below error

Action on Failure 'TERMINATE_CLUSTER' is invalid Status Code: 400 ErrorCode: ValidationException

What i am missing here

Upvotes: 1

Views: 2564

Answers (2)

Wei Yang
Wei Yang

Reputation: 1

I had the same issue, but I found out that I created my EMR with step concurrency. See the screenshot of EMR step tab.

Apparently you must add EMR steps with actionOnFailure = CONTINUE for EMR with step concurrency greater than 1.

See the AWS documentation on this: Considerations for running multiple steps in parallel

Upvotes: 0

Naveen Santhanavel
Naveen Santhanavel

Reputation: 439

If your cluster is created and terminated in the State Machine itself, there is another way to achieve this by using Step functions Error Handling to catch the Error and divert to the terminate step.

For e.g.,

"Submit_Job": {
  "Type": "Task",
  "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
  "Parameters": {
    "ClusterId.$": "$.FormattedInputsForEmr.ClusterId",
    "Step": {
      "Name": "Start Job",
      "ActionOnFailure": "CONTINUE",
      "HadoopJarStep": {
        "Jar": "command-runner.jar",
        "Args.$": "$.FormattedInputsForEmr.Args[*][*]"
      }
    },
    "Catch": [
      {
        "ErrorEquals": [
          "States.ALL"
        ],
        "Next": "<Terminate-Cluster-State>"
      }
    ]
  }
}

Upvotes: 1

Related Questions