user6723321
user6723321

Reputation:

Running Scrapy in a docker container

I am setting up a new application which I would like to package using docker-compose. Currently in one container I have a Flask-Admin application which also exposes a API for interacting with the database. I then will have lots of scrapers that need to run once a day. These scrapers should scrape the data, reformat the data and then send it to the API. I expect I should have another docker container running for the scrapers.

Currently, on my local machine I run Scrapy run-spider myspider.py to run each spider.

What would be the best way to have multiple scrapers in one container and have them scheduled to run at various points during the day?

Upvotes: 2

Views: 2825

Answers (2)

Syed Haider Ali Zaidi
Syed Haider Ali Zaidi

Reputation: 11

You can containerize each scraper and use Kubernetes to manage these containers, this is highly effective for scalability and fault tolerance since Kubernetes can automatically restart failed scrapers and it also provides us with excellent management capabilities.

Upvotes: 0

Jay Atkinson
Jay Atkinson

Reputation: 3287

You could configure your docker container that has the scrapers to use "cron" to fire off the spiders at appropriate times. Here's an example:"Run a cron job with Docker"

Upvotes: 2

Related Questions