Reputation: 71
I have created 3 Glue jobs which have one of Job Parameters key/value like this: runid id
If I execute Glue job using AWS CLI like this, it is working fine: aws glue start-job-run --jobname $job --arguments='--runid="Runid_10"'
These 3 Glue jobs are inside one step function and state machine Definition is :
{
"Comment":"Sample Step Function",
"StartAt":"First Glue Job",
"States": {
"First Glue Job":{
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJob-Firstjob"
},
"Next": "Second Glue Job"
},
"Second Glue Job":{
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJob-Secondjob"
},
"Next": "Third Glue Job"
},
"Third Glue Job":{
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJob-Thirdjob"
},
"End": true
}
}
}
If I am try to execute this State machine with Input parameter, this input parameter value is not passing to Gluejob. Do I need to modify State machine job definition to pass input parameter value to Glue job which has passed as part of state machine run. Please guide me how to do it.
aws stepfunctions start-execution --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:HelloWorld --input "{runid":"Runid_10"}
State Machine is executing successfully but Runid value is not passing to Gluejob Parameters.
Please let me know how to pass Glue job parameters value which has inside state machine Definition.
I am using Arguments parameter like this:
{
"Comment":"Sample Step Function",
"StartAt":"First Glue Job",
"States": {
"First Glue Job":{
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJob-Firstjob",
"Arguments": {
"--runid":"$.runid"
}
},
"ResultPath" : "$.runid",
"Next": "Second Glue Job"
},
"Second Glue Job":{
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJob-Secondjob",
"Arguments": {
"--runid":"$.runid"
}
},
"ResultPath" : "$.runid",
"Next": "Third Glue Job"
},
"Third Glue Job":{
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJob-Thirdjob",
"Arguments": {
"--runid":"$.runid"
}
},
"ResultPath" : "$.runid",
"End": true
}
}
}
Passing Input JSON {"--runid" : "runid_10"} in Input-optional window during execution of "start execution" state machine.
Please Note : runid_n , where n is integer and will change.
I will append runid_10 value to the output file in Glue job and output file is like GlueJob-Firstjob_output_runid_10.csv
Upvotes: 4
Views: 27025
Reputation: 2415
In my case, I was trying to pass the output from previous lambda function as input into a Glue Job. All the flow was carried out in a Step Function.
Input for Glue Job obtained as Event from Previous Lambda Job.
I wanted to pass the "sender" value into glue job via Step Function. Therefore, I had to modify the API Parameters like below for adding arguments.
Strict Syntax : "--sender.$" : "$.sender"
Make sure
In Code Snippet Section, Manually add the below snippet.
"ResultPath": "$.output"
Reference Screenshot for ResultPath.
Then in PySpark Glue Script, you can easily access the input parameters using the below snippet.
args = getResolvedOptions(sys.argv, ['sender'])
print(args['sender'])
Upvotes: 4
Reputation: 698
You need to add it inside Arguments
within Parameters
attribute so it will be like:
"Parameters" : {
"JobName": "GlueJob-Firstjob",
"Arguments": {
"--run_id":"$.runid"
}
}
For your reference, you can check.Supported parameters section: here.
UPDATE: You need to add ResultPath in your task definitions like:
"First Glue Job":{
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJob-Firstjob",
"Arguments": {
"--runid":"$.runid"
}
},
"ResultPath": "$.output"
}
Upvotes: 10
Reputation: 581
The input value 'runid' is passed as an event to the Lambda function inside your Step Function. To pass it from one Lambda function to the other, you can just return the event which will pass data from start to finish. That event contains your 'runid' argument.
Take a look here.
Upvotes: 0