Emma
Emma

Reputation: 403

import error : No module in AWS Glue job script- Python

I am trying to provide my custom python code which requires libraries that are not supported by AWS(pandas). So, I created a zip file with the necessary libraries and uploaded it to the S3 bucket. While running the job, I pointed the path of S3 bucket in the advanced properties.Still my job is not running successfully. Can anyone suggest why? 1.Do I have to include my code in the zip file? If yes then how will Glue understand that it's the code? 2. Also do I need to create a package or just zip file will do? Appreciate the help!

Upvotes: 3

Views: 16584

Answers (3)

Yuva
Yuva

Reputation: 3163

An update on AWS Glue Jobs released on 22nd Jan 2019.

Introducing Python Shell Jobs in AWS Glue -- Posted On: Jan 22, 2019

Python shell jobs in AWS Glue support scripts that are compatible with Python 2.7 and come pre-loaded with libraries such as the Boto3, NumPy, SciPy, pandas, and others. You can run Python shell jobs using 1 DPU (Data Processing Unit) or 0.0625 DPU (which is 1/16 DPU). A single DPU provides processing capacity that consists of 4 vCPUs of compute and 16 GB of memory.

More info at : https://aws.amazon.com/about-aws/whats-new/2019/01/introducing-python-shell-jobs-in-aws-glue/

https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

Upvotes: 2

Nikolay D
Nikolay D

Reputation: 347

As Yuva's answer mentioned, I believe it's currently impossible to import a library that is not purely in Python and the documentation reflects that.

However, in case someone came here looking for an answer on how to import a python library in AWS Glue in general, there is a good explanation in this post on how to do it with the pg8000 library: AWS Glue - Truncate destination postgres table prior to insert

Upvotes: 0

Yuva
Yuva

Reputation: 3163

According to AWS Glue Documentation:

Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported.

I think it wouldn't work even if we upload the python library as a zip file, if the library you are using has a dependency for C extensions. I had tried using Pandas, Holidays, etc the same way you have tried, and on contacting AWS Support, they mentioned it is in their to do list (support for these python libaries), but no ETA as of now.

So, any libraries that are not native python, would not work in AWS Glue, at this point. But should be available in the near future, since this is a popular demand.

If still you would like to try it out, please refer to this link, where its explained how to package the external libraries to run in AWS glue, I tried it but didnt work for me.

Upvotes: 2

Related Questions