bda
bda

Reputation: 422

smart_open Python Library

Is smart-open Python library considered a C library? https://pypi.org/project/smart-open/

I have packaged it, uploaded to S3 and trying to use it in AWS Glue Python Shell Script Job as described in these instructions: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html#aws-glue-programming-python-libraries-job

However, I am getting an error running the job (error log below). Could it be that smart_open is a C library or would the error be different if so?

    Traceback (most recent call last):
  File "/tmp/runscript.py", line 118, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-kuvx2b2y/hello-world.py", line 1, in <module>
ModuleNotFoundError: No module named 'smart_open'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 137, in <module>
    raise e_type(e_value).with_tracsback(new_stack)
AttributeError: 'ModuleNotFoundError' object has no attribute 'with_tracsback'

Upvotes: 1

Views: 6515

Answers (3)

piyumi_rameshka
piyumi_rameshka

Reputation: 350

In order to successfully import smart-open library to an AWS Glue Python Shell job, Glue job's library path should contain either smart-open whl file or egg file. To create the smart-open egg file,

  1. Download smart-open library files from here
  2. Extract the zipped file
  3. Inside the smart-open root folder, execute python3 setup.py bdist_egg
  4. Above will create smart-open egg file inside the dist folder
  5. Add the egg file to s3 and include that s3 path in Glue Job Python library path

Above steps will resolve smart-open ModuleNotFoundError

Upvotes: 2

bda
bda

Reputation: 422

  1. Per Alex Hall's response, smart_open is entirely Python.
  2. I ended up utilizing the s3fs library instead of smart_open.
  3. To import s3fs into my AWS Glue Python Shell Job, I downloaded the s3fs wheel file from the 3sfs package webpage: https://pypi.org/project/s3fs/#files, uploaded it to S3 and specified its location in the "Python Lib Path" setting of the Glue Python Shel Job. Example: s3://bucket-name/s3fs-0.4.2-py3-none-any.whl
  4. When my Glue Python Shell Job runs, it imports s3fs and all its dependencies.

Upvotes: 0

Alex Hall
Alex Hall

Reputation: 36043

No, on github it's clear that it's entirely Python: https://github.com/RaRe-Technologies/smart_open

Upvotes: 2

Related Questions