Ricardo Francois
Ricardo Francois

Reputation: 792

Glue ETL: How to reference config file as extra file using AWS Management Console?

I'm trying to use Glue ETL as a job scheduler for my Python script which also references a JSON config file.

According to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html, there is a parameter called --extra-files which is said to be an S3 path to additional files like configuration files. I can't seem to find this on the console when I create my job.

What I've done is upload my config file to the same S3 bucket as my python script for Glue ETL, which I include in the Referenced files path parameter.

Within my script, I refer to my config file as:

with open('config.json', 'r') as config:
    config = json.load(config)

There aren't any issues with the logic of my code as it all works fine when run locally.

However, when I try to run the Glue ETL job, I seem to get a failure message saying No such file or directory: 'config.json'.

What am I doing wrong here? How can I make my use case work with Glue ETL?

Upvotes: 1

Views: 3285

Answers (1)

Rohit P
Rohit P

Reputation: 633

These arguments can be passed as job parameters. On the console, this is found under section Security configuration, script libraries, and job parameters (optional) when creating or editing a job.

enter image description here

As per this answer, if you are using Referenced files path variable in a Python shell job, referenced file is found in /tmp, where Python shell job has no access by default.

Upvotes: 2

Related Questions