Tanmay Dhyani
Tanmay Dhyani

Reputation: 61

How to run python code on AWS lambda with package dependencies >500MB?

The requirement is that I have to trigger a SageMaker endpoint on lambda to get predictions(which is easy) but have to do some extra processing for variable importance using packages such as XGBoost and SHAP.

I am able to hit the endpoint and get variable importance using the SageMaker Jupyter notebook. Now, I want to replicate the same thing on AWS lambda.

1) How to run python code on AWS lambda with package dependencies for Pandas, XGBoost and SHAP (total package size greater than 500MB). The unzipped deployment package size is greater than 250 MB, hence lambda is not allowing to deploy. I even tried using lambda function from Cloud9 and got the same error due to size restrictions. I have also tried lambda layers, but no luck.

2) Is there a way for me to run the code with such big packages on or through lambda bypassing the deployment package size limitation of 250 MB

3) Is there a way to trigger a SageMaker notebook execution through lambda which would do the calculations and return the output back to lambda?

Upvotes: 6

Views: 1848

Answers (4)

Eladio
Eladio

Reputation: 459

In addition to use multiple layers for your dependencies - you may want to reduce the *.so files by linux strip command which discards symbols from compiled object files which may not necessary in production

In order to strip all *.so -

  1. use linux/docker container with access to your dependencies directory
  2. cd to your dependencies directory
  3. Run
find . -name *.so -exec strip {} \;

Will execute strip command on every *.so file in the current working directory recursively.

It helped me reduce one of my dependencies objects from 94MB to just 7MB

Upvotes: 3

user2555515
user2555515

Reputation: 1029

I found the 250MB limitation on AWS lambda size to be draconian. Only one file ibxgboost.so from xgboost package is already around 140 MB which leaves only 110Mb for everything else. That makes AWS lambdas useless for anything but simple "hello world" stuff. As an ugly workaround you can store xgboost package somewhere on s3 an copy it to the /tmp folder from the lambda invocation routine and point your python path to it. The allowed tmp space is a bit higher - 500MB so it might work. I am not sure though if the /tmp folder is not cleaned between the lambda function runs though.

Upvotes: 1

raj
raj

Reputation: 1213

You can try using SageMaker Inference Pipelines to do pre-processing before making actual predictions. Basically, you can use the same pre-processing script used for training for inference as well. When the pipeline model is deployed, the full set of containers with pre-processing tasks installs and runs on each EC2 instance in the endpoint or transform job. Feature processing and inferences are executed with low latency because the containers deployed in an inference pipeline are co-located on the same EC2 instance (endpoint). You can refer documentation here.

Following blog posts/notebooks cover this feature in detail

Upvotes: 0

Tuong Le
Tuong Le

Reputation: 19220

Try to upload your dependencies to the Lambda Layer. FYI: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

Upvotes: 3

Related Questions