Regressor
Regressor

Reputation: 1973

AWS lambda building external dependency libraries in python

I am trying to create an AWS lambda function using Python. Instead of an inline function I want a create a zip deployment package and then upload it in my AWS environment. I have my source code in test.py file and other dependencies like numpy, sklearn and so on in the same folder as my source code is.

I am facing an error when I test my lambda function.

Unable to import module 'test': No module named 'sklearn.__check_build._check_build' ___________________________________________________________________________ Contents of /var/task/sklearn/__check_build: setup.py
__pycache__ _check_build.cp36-win_amd64.pyd __init__.py ___________________________________________________________________________ It seems that scikit-learn has not been built correctly. If you have installed scikit-learn from source, please do not forget to build the package before using it: run python setup.py install or make in the source directory. If you have used an installer, please check that it is suited for your Python version, your operating system and your platform.

Here is my python source code which resides in test.py

from sklearn.model_selection import train_test_split
print('Loading function')


def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))
    print("value1 is " + event['key1'])
    print("value2 is " + event['key2'])
    print("value3 is " + event['key3']) 
    return event

I am facing a similar issue if I import numpy in my source code. (cannot import multiarray)

I am installing every library using pip install numpy/scikit-learn -t /path/to/mydir/.

Here is the folder structure after I use pip install commands

Kindly help me resolve the issue. Thanks !!

Upvotes: 3

Views: 4242

Answers (1)

brianz
brianz

Reputation: 7448

There are likely two issues here:

  1. Python packages which have C bindings need to be built (pip install) on a machine with the same architecture as that which Lambda functions run (i.e., Linux)
  2. With AWS Lambda, you are responsible for managing Python's path so it can find your dependencies. You likely need to update the path at runtime.

To solve #1, I use the official Python Docker image.

docker run --rm -it \
        -v `pwd`:/code \
        python:2 bash

Now, whenever you do a pip install -t lib numpy or the like, you will get the correct .so files. The trick here is using the volume argument (-v) so that the when you shut the container down the lib directory is preserved on your host machine.

To solve 2, I always structure my serverless/lambda project like this:

$ tree -L 2
.
├── handler.py
├── lib
│   └── numpy
└── serverless.yml

That is, all of my dependencies go inside lib.

pip install -t lib numpy

At the top of handler.py, I always have these 4 lines:

import os
import sys

CWD = os.path.dirname(os.path.realpath(__file__))
sys.path.insert(0, os.path.join(CWD, "lib"))

# now it's ok to import extra libraries
import numpy as np

After the sys.path.insert, any imports for your packages will work.

Upvotes: 7

Related Questions