Reputation: 467
Now i want to complete a distributed scraper with scrapy and celery,my current idea is to use master-slave method,can someone tell me is that a good idea?is there a good open-source project for this?
Upvotes: 1
Views: 797
Reputation: 11155
When i implemented a distributed crawling set-up.I achieved that with the help of redis. Here is how i did it.
I have a list of domains to be crawled upon.I will upload those domains to redis.In my project, i had 30K domains to scrape data from.
Use redis-py client to talk to redis, and feed the each url to scrapy.
Upvotes: 2