Greg McGuffey
Greg McGuffey

Reputation: 3316

How to have a Python glue job return when called in step function?

I have a glue job, in python, that I call from a step function. The step function successfully starts the job. The job successfully finishes. But the step function never moves to the next step. Is there some required configuration/permission for the step function to respond to job success? Something to do in the python script?

Here is the step function (state machine) definition:

"MyGlueTask": {
  "Type": "Task",
  "Resource": "arn:aws:states:::glue:startJobRun.sync",
  "Parameters": {
    "JobName": "my_glue_job"
  },
  "ResultPath": "$.MyGlueTask",
  "Next": "NextGlueJob"
}

Upvotes: 3

Views: 4228

Answers (2)

Greg McGuffey
Greg McGuffey

Reputation: 3316

The solution to my actual problem was permissions. You need four permissions when running a startJogRun.sync:

  • glue:StartJobRun
  • glue:GetJobRun
  • glue:GetJobRuns
  • glue:BatchStopJobRun

Those are actually the Terraform values, but should help anybody struggling with this.

Upvotes: 11

pdanchenko
pdanchenko

Reputation: 212

Are you sure it never moves to the next step? Maybe it does, but, for instance, in 5 minutes?

I'm asking that because Step Functions has the limitation: even if your Glue job executes in a few seconds, Step Functions polls the results from Glue job once every 5 minutes actually.

A kind of workaround you could implement is to change arn:aws:states:::glue:startJobRun.sync to arn:aws:states:::glue:startJobRun — then Glue job task just will trigger the Glue job and will move to the next step.

Most likely, you will need to wait the Glue job finished and get some result out of there. Therefore, you need to wrap the previous state with a few more ones.

  1. The main purpose is to merely start the Glue job. Apart from that, we need Glue job RunJobId. I don't know if it can be retrieved from Glue job itself, so I've created a Lambda to run the Glue job using boto3 start_job_run function and then get RunJobId from the response.
  2. Create a Lambda which will be grabbing the status (JobRunState) of the Glue job (via boto3 get_job_run function) by RunJobId from the previous step.
  3. Using Wait Step Functions state type, run the Lambda you created every N seconds.
  4. Use Choice state type to filter Glue job statuses out.
    • If RUNNING, go back to the Wait step.
    • If SUCCEEDEED, then go ahead to the next state.
    • If [FAILED | STOPPED], go wherever else.

Finally, it looks something like this.

Upvotes: -1

Related Questions