Reputation: 2850
I have a pythonshell job inside AWS glue that needs to download a file from a s3 path. This s3 path location is a variable so will come to the glue job as a payload in start_run_job
call like below:
import boto3
payload = {'s3_target_file':s3_TARGET_FILE_PATH,
's3_test_file': s3_TEST_FILE_PATH}
job_def = dict(
JobName=MY_GLUE_PYTHONSHELL_JOB,
Arguments=payload,
WorkerType='Standard',
NumberOfWorkers=2,
)
response = glue.start_job_run(**job_def)
My question is, how do I retrieve those s3 paths from the payload inside AWS Glue pythonshell job that comes through boto3? Is there any sort of handler we need to write similar to AWS Lambda?
Please suggest.
Upvotes: 1
Views: 752
Reputation: 13581
Check the docimentation. All you need is here.
You can use the getResolvedOptions
as follows:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JOB_NAME',
'day_partition_key',
'hour_partition_key',
'day_partition_value',
'hour_partition_value'])
print "The day partition key is: ", args['day_partition_key']
print "and the day partition value is: ", args['day_partition_value']
Upvotes: 2