Reputation: 476
We are creating EMR cluster from stepfunction.Below are steps it performs
Issue here is when the #3 step fails (submit job) it does not terminate the cluster when job fails. When job success it terminates cluster. Below is definition of step (Ignore syntax)
"Submit_Job": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync"
"Parameters": {
"ClusterId.$": "$.FormattedInputsForEmr.ClusterId",
"Step": {
"Name": "Start Job",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "$.FormattedInputsForEmr.Args[*][*]"
}
}
}
Now when i try ActionOnFailure=TERMINATE_CLUSTER it gives me below error
Action on Failure 'TERMINATE_CLUSTER' is invalid Status Code: 400 ErrorCode: ValidationException
What i am missing here
Upvotes: 1
Views: 2564
Reputation: 1
I had the same issue, but I found out that I created my EMR with step concurrency. See the screenshot of EMR step tab.
Apparently you must add EMR steps with actionOnFailure = CONTINUE
for EMR with step concurrency greater than 1.
See the AWS documentation on this: Considerations for running multiple steps in parallel
Upvotes: 0
Reputation: 439
If your cluster is created and terminated in the State Machine itself, there is another way to achieve this by using Step functions Error Handling to catch the Error and divert to the terminate step.
For e.g.,
"Submit_Job": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.FormattedInputsForEmr.ClusterId",
"Step": {
"Name": "Start Job",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "$.FormattedInputsForEmr.Args[*][*]"
}
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "<Terminate-Cluster-State>"
}
]
}
}
Upvotes: 1