Miguel Trejo
Miguel Trejo

Reputation: 6687

How to use Awswrangler inside a Glue Job?

For some reasons, I want to use the python package awswrangler inside a Python 3 Glue Job. There are two main ways I've considered for installing awswrangler:

import os
os.system('python -m pip install --user awswrangler==0.0.b0')

Notice in the last case, that I've gone down to even use the first pre-release version of awswrangler. Full list of versions can be found here. However, even with the first prelease I'm unable to use awswrangler on a Glue script. Is there a way to achieve this?

Upvotes: 5

Views: 11154

Answers (3)

Jin
Jin

Reputation: 321

Add the key/value below to the Glue Job parameters, it works for me for installing and using awswrangler.

Key: --additional-python-modules
Value: pyarrow==2,awswrangler==2.4.0

Upvotes: 9

Mohammad AL-Amar
Mohammad AL-Amar

Reputation: 21

use this it work for me

import os
import sys
import subprocess

subprocess.call('pip3 install awswrangler -t /tmp/ --no-cache-dir'.split(), stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
sys.path.insert(1, '/tmp/')

Upvotes: 2

Miguel Trejo
Miguel Trejo

Reputation: 6687

It turns out that the official Awswrangler Documentation provides you with a .whl file, that contains the desired version of the package, to specify on the Python library path field of the Glue Job. According to the documentation, the steps to follow are:

  1. Download the .whl file related to the version that you want to install of awswrangler from here.

  2. Upload the .whl file to an s3 bucket, notice that the role you assign to your glue job should have access to read this bucket.

  3. In the in the Python library path field specify the location of the wheel file. For example, for the current 1.9.3 version it is s3://your-bucket/glue_wheels/awswrangler-1.9.3-py3-none-any.whl

Upvotes: 2

Related Questions