Reputation: 792
I'm trying to use Glue ETL as a job scheduler for my Python script which also references a JSON config file.
According to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html, there is a parameter called --extra-files
which is said to be an S3 path to additional files like configuration files. I can't seem to find this on the console when I create my job.
What I've done is upload my config file to the same S3 bucket as my python script for Glue ETL, which I include in the Referenced files path
parameter.
Within my script, I refer to my config file as:
with open('config.json', 'r') as config:
config = json.load(config)
There aren't any issues with the logic of my code as it all works fine when run locally.
However, when I try to run the Glue ETL job, I seem to get a failure message saying No such file or directory: 'config.json'
.
What am I doing wrong here? How can I make my use case work with Glue ETL?
Upvotes: 1
Views: 3285
Reputation: 633
These arguments can be passed as job parameters. On the console, this is found under section Security configuration, script libraries, and job parameters (optional) when creating or editing a job.
As per this answer, if you are using Referenced files path variable in a Python shell job, referenced file is found in /tmp
, where Python shell job has no access by default.
Upvotes: 2