Reputation: 119
I basically want to retry a glue job twice after status FAILED or TIMEOUT, before moving to the next stage. My state machine looks like this:
{
"Comment": "A description of my state machine",
"StartAt": "Glue StartJobRun",
"States": {
"Glue StartJobRun": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun",
"Parameters": {
"JobName": "test-to-fail"
},
"End": true,
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"MaxAttempts": 2,
"IntervalSeconds": 300,
"BackoffRate": 1
}
]
}
}
}
Nevertheless, the job runs one single time when I execute the state machine. Could it be that I misunderstood the logic behind Retry and it applies to the step function state and not the job? I followed this tutorial, which seems to confirm it is the job status. Then, what is wrong? And how can I achieve this?
Thank you!
Upvotes: 0
Views: 541
Reputation: 1321
Actually you getting retry logic wrong
"Glue StartJobRun": { "Type": "Task", "Resource": "arn:aws:states:::glue:startJobRun",
This step will be considered failed if Glue job is not started. And it will retry to start it. As job was started, it is considered success.
Solution: But to check status of the job you should have next step with retry attempts.
somehow like this..
{
"Comment": "A state machine that starts a Glue job and checks its status",
"StartAt": "Start Glue Job",
"States": {
"Start Glue Job": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "YourGlueJobName"
},
"Next": "Check Job Status"
},
"Check Job Status": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:getJobRun",
"Parameters": {
"JobName": "YourGlueJobName",
"RunId.$": "$.RunId"
},
"Retry": [
{
"ErrorEquals": ["States.TaskFailed"],
"IntervalSeconds": 60,
"MaxAttempts": 2,
"BackoffRate": 2.0
}
],
"End": true
}
}
}
Upvotes: 1