Using scrapy in AWS lambda function as a layer

Question

I'm trying to use scrapy in AWS lambda function as a layer.

I used pip to install scrapy in my directory:

pip install scrapy

the directory format is as all layers I already have working. I zipped and uploaded into layers. I included the layer to the lambda function. I import the scrapy:

import scrapy

and when I run the project I obtain this error:

{
  "errorMessage": "Unable to import module 'lambda_function'"
}

and

Unable to import module 'lambda_function': /opt/python/lxml/etree.so: invalid ELF header

Shuvojit · Accepted Answer

As the comment by @balderman suggests, you need native libraries for scrapy to run. This is very much doable, I'll try to explain as simply as possible.

The binaries for scrapy has to be compiled in the same environment as a lambda instance. Lambda gets booted up using AWS Linux.

You can either boot up an EC2 running AmazonLinux or use docker, easiest way is to boot up a docker container.

$ sudo docker run -it amazonlinux bash

Now you need to download/unpack all .so files into a directory then zip it. Also, make sure to keep all .so files inside a folder called lib inside the zip. After zipping, the zip should look something similar to this:

.
├── lib
│   ├── libcrypto.so.10
│   ├── libcrypto.so.1.0.2k
│   ├── libfontconfig.so.1
│   ├── libfontconfig.so.1.7.0
.......

Then you can just zip it and upload it as a layer. It will be uploaded to /opt/ in your Lambda Container. AWS looks for library files under /opt/lib amongst many other locations.

The challenging part for you would be to figure out how to get all the required .so files in order for scrapy to run properly.

Using scrapy in AWS lambda function as a layer

Answers (1)

Related Questions