Reputation: 42520
I am creating a scrapy
project and the structure looks like:
I can run the app via scrapy
command line scrapy crawl SPIDER_NAME
but how can I package the app as a regular python program which can be run in AWS lambda
?
From the command line scrapy crawl SPIDER_NAME
, I don't know the entry point for the program. Lambda requries handler
method as its entry point, so how can I trigger the scrapy task programmatically?
Upvotes: 0
Views: 334
Reputation: 911
You should include scrapy
in your Lambda package e.g.:
pip install scrapy -t YOUR_LAMBDA_ROOT_DIR
If you will have multiple Lambdas using scrapy
it is recommended to install it as a Lambda Layer to simplify deployment and maintenance. Make sure that scrapy
and all the dependencies (especially binary) are available from your lambda package.
In order to use scrapy
as a Lambda one of the approaches is to implement scrapy.crawler.Crawler
in your lambda_function and call the crawl() method from the lambda_handler.
https://docs.scrapy.org/en/latest/topics/api.html
import scrapy
from scrapy.crawler import Crawler
from scrapy.settings import Settings
def lambda_handler(event, context):
settings = Settings(YOUR_SETTINGS)
crawler = Crawler(spidercls=YOUR_SPIDER, settings=settings)
report = crawler.crawl()
Please note that you may face Lambda execution time limits and you will probably need to chunk your data to multiple invocations. sosw package could be useful to simplify that.
Upvotes: 1