Reputation: 403
I am trying to provide my custom python code which requires libraries that are not supported by AWS(pandas). So, I created a zip file with the necessary libraries and uploaded it to the S3 bucket. While running the job, I pointed the path of S3 bucket in the advanced properties.Still my job is not running successfully. Can anyone suggest why? 1.Do I have to include my code in the zip file? If yes then how will Glue understand that it's the code? 2. Also do I need to create a package or just zip file will do? Appreciate the help!
Upvotes: 3
Views: 16584
Reputation: 3163
An update on AWS Glue Jobs released on 22nd Jan 2019.
Introducing Python Shell Jobs in AWS Glue -- Posted On: Jan 22, 2019
Python shell jobs in AWS Glue support scripts that are compatible with Python 2.7 and come pre-loaded with libraries such as the Boto3, NumPy, SciPy, pandas, and others. You can run Python shell jobs using 1 DPU (Data Processing Unit) or 0.0625 DPU (which is 1/16 DPU). A single DPU provides processing capacity that consists of 4 vCPUs of compute and 16 GB of memory.
More info at : https://aws.amazon.com/about-aws/whats-new/2019/01/introducing-python-shell-jobs-in-aws-glue/
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html
Upvotes: 2
Reputation: 347
As Yuva's answer mentioned, I believe it's currently impossible to import a library that is not purely in Python and the documentation reflects that.
However, in case someone came here looking for an answer on how to import a python library in AWS Glue in general, there is a good explanation in this post on how to do it with the pg8000 library: AWS Glue - Truncate destination postgres table prior to insert
Upvotes: 0
Reputation: 3163
According to AWS Glue Documentation:
Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported.
I think it wouldn't work even if we upload the python library as a zip file, if the library you are using has a dependency for C extensions. I had tried using Pandas, Holidays, etc the same way you have tried, and on contacting AWS Support, they mentioned it is in their to do list (support for these python libaries), but no ETA as of now.
So, any libraries that are not native python, would not work in AWS Glue, at this point. But should be available in the near future, since this is a popular demand.
If still you would like to try it out, please refer to this link, where its explained how to package the external libraries to run in AWS glue, I tried it but didnt work for me.
Upvotes: 2