How to get multiple Scrapy spiders to parse a list of URL links at the same time?

Question

I've got a list of about 1000 URL's and I need to extract the same kind of data from each one. Is there a way to get Scrapy to "deploy" multiple spiders at once to each take a URL from the list and parse the page and then output to a common dictionary? I'm thinking of using 10 or more spiders to do this.

vchslv13 · Accepted Answer

Did you try to solve the task without using multiple spiders?

Try to just add all the URLs to 'start_urls' list or get the list of URLs from file in 'start_requests' method and adjust the level of concurrency using Srapy`s settings such as 'CONCURRENT_REQUESTS' and 'CONCURRENT_ITEMS', like:

custom_settings = {
    'CONCURRENT_REQUESTS': '1000',
    'CONCURRENT_ITEMS': '10000'
}

Or something which fits your tasks more.

P.S. Generating a number of Scrapy spiders from the list of URLs and running them concurrently with scrapy-deploy (http://scrapyd.readthedocs.io/en/stable/) is also an option, although it looks like a little dirty as for me.

How to get multiple Scrapy spiders to parse a list of URL links at the same time?

Answers (1)

Related Questions