Joey Yi Zhao
Joey Yi Zhao

Reputation: 42520

How to run a `scrapy` project as a regular `python` app in order to run it from lambda?

I am creating a scrapy project and the structure looks like:

enter image description here

I can run the app via scrapy command line scrapy crawl SPIDER_NAME but how can I package the app as a regular python program which can be run in AWS lambda? From the command line scrapy crawl SPIDER_NAME, I don't know the entry point for the program. Lambda requries handler method as its entry point, so how can I trigger the scrapy task programmatically?

Upvotes: 0

Views: 334

Answers (2)

Nikolay Grishchenko
Nikolay Grishchenko

Reputation: 911

You should include scrapy in your Lambda package e.g.:

pip install scrapy -t YOUR_LAMBDA_ROOT_DIR

If you will have multiple Lambdas using scrapy it is recommended to install it as a Lambda Layer to simplify deployment and maintenance. Make sure that scrapy and all the dependencies (especially binary) are available from your lambda package.

In order to use scrapy as a Lambda one of the approaches is to implement scrapy.crawler.Crawler in your lambda_function and call the crawl() method from the lambda_handler.

https://docs.scrapy.org/en/latest/topics/api.html

import scrapy

from scrapy.crawler import Crawler
from scrapy.settings import Settings

def lambda_handler(event, context):
    settings = Settings(YOUR_SETTINGS)
    crawler = Crawler(spidercls=YOUR_SPIDER, settings=settings)
    report = crawler.crawl()

Please note that you may face Lambda execution time limits and you will probably need to chunk your data to multiple invocations. sosw package could be useful to simplify that.

Upvotes: 1

0x01h
0x01h

Reputation: 925

import scrapy in your AWS Python lambda function.

Upvotes: 0

Related Questions