naikum
naikum

Reputation: 71

AWS : Passing Job parameters Value to Glue job from Step function

I have created 3 Glue jobs which have one of Job Parameters key/value like this: runid id

If I execute Glue job using AWS CLI like this, it is working fine: aws glue start-job-run --jobname $job --arguments='--runid="Runid_10"'

These 3 Glue jobs are inside one step function and state machine Definition is :

{
  "Comment":"Sample Step Function",
  "StartAt":"First Glue Job",
  "States": {
      "First Glue Job":{
         "Type": "Task",
         "Resource": "arn:aws:states:::glue:startJobRun.sync",
         "Parameters": {
            "JobName": "GlueJob-Firstjob"
      },
      "Next": "Second Glue Job"
    },
    "Second Glue Job":{
         "Type": "Task",
         "Resource": "arn:aws:states:::glue:startJobRun.sync",
         "Parameters": {
            "JobName": "GlueJob-Secondjob"
      },
      "Next": "Third Glue Job"
    },
    "Third Glue Job":{
         "Type": "Task",
         "Resource": "arn:aws:states:::glue:startJobRun.sync",
         "Parameters": {
            "JobName": "GlueJob-Thirdjob"
      },
      "End": true
    }
  }
}

If I am try to execute this State machine with Input parameter, this input parameter value is not passing to Gluejob. Do I need to modify State machine job definition to pass input parameter value to Glue job which has passed as part of state machine run. Please guide me how to do it.

aws stepfunctions start-execution --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:HelloWorld --input "{runid":"Runid_10"}

State Machine is executing successfully but Runid value is not passing to Gluejob Parameters.

Please let me know how to pass Glue job parameters value which has inside state machine Definition.


I am using Arguments parameter like this:

{
  "Comment":"Sample Step Function",
  "StartAt":"First Glue Job",
  "States": {
      "First Glue Job":{
         "Type": "Task",
         "Resource": "arn:aws:states:::glue:startJobRun.sync",
         "Parameters": {
            "JobName": "GlueJob-Firstjob",
            "Arguments": {
                  "--runid":"$.runid"
                }
      },
      "ResultPath" : "$.runid",
      "Next": "Second Glue Job"
    },
    "Second Glue Job":{
         "Type": "Task",
         "Resource": "arn:aws:states:::glue:startJobRun.sync",
         "Parameters": {
            "JobName": "GlueJob-Secondjob",
            "Arguments": {
                  "--runid":"$.runid"
                }
      },
      "ResultPath" : "$.runid",
      "Next": "Third Glue Job"
    },
    "Third Glue Job":{
         "Type": "Task",
         "Resource": "arn:aws:states:::glue:startJobRun.sync",
         "Parameters": {
            "JobName": "GlueJob-Thirdjob",
            "Arguments": {
                  "--runid":"$.runid"
                }
      },
      "ResultPath" : "$.runid",
      "End": true
    }
  }
}

Passing Input JSON {"--runid" : "runid_10"} in Input-optional window during execution of "start execution" state machine.

Please Note : runid_n , where n is integer and will change.

I will append runid_10 value to the output file in Glue job and output file is like GlueJob-Firstjob_output_runid_10.csv

Upvotes: 4

Views: 27025

Answers (3)

vijayraj34
vijayraj34

Reputation: 2415

In my case, I was trying to pass the output from previous lambda function as input into a Glue Job. All the flow was carried out in a Step Function.

Input for Glue Job obtained as Event from Previous Lambda Job.

enter image description here



I wanted to pass the "sender" value into glue job via Step Function. Therefore, I had to modify the API Parameters like below for adding arguments.

Strict Syntax : "--sender.$" : "$.sender"

Make sure

  • .$ is followed at suffix in the KEY.
  • .$ is appended in prefix for VALUE.

enter image description here


In Code Snippet Section, Manually add the below snippet.

"ResultPath": "$.output"

Reference Screenshot for ResultPath. enter image description here

Then in PySpark Glue Script, you can easily access the input parameters using the below snippet.

args = getResolvedOptions(sys.argv, ['sender'])
print(args['sender'])

I hope the above answer helps your purpose.

Upvotes: 4

Frosty
Frosty

Reputation: 698

You need to add it inside Arguments within Parameters attribute so it will be like:

"Parameters" : {
           "JobName": "GlueJob-Firstjob",
           "Arguments": {
                  "--run_id":"$.runid"
                }

}

For your reference, you can check.Supported parameters section: here.

UPDATE: You need to add ResultPath in your task definitions like:

"First Glue Job":{
         "Type": "Task",
         "Resource": "arn:aws:states:::glue:startJobRun.sync",
         "Parameters": {
            "JobName": "GlueJob-Firstjob",
            "Arguments": {
                  "--runid":"$.runid"
                }
      },
       "ResultPath": "$.output"
}

Upvotes: 10

ms12
ms12

Reputation: 581

The input value 'runid' is passed as an event to the Lambda function inside your Step Function. To pass it from one Lambda function to the other, you can just return the event which will pass data from start to finish. That event contains your 'runid' argument.

Take a look here.

Upvotes: 0

Related Questions