Reputation: 41
I've got a list of about 1000 URL's and I need to extract the same kind of data from each one. Is there a way to get Scrapy to "deploy" multiple spiders at once to each take a URL from the list and parse the page and then output to a common dictionary? I'm thinking of using 10 or more spiders to do this.
Upvotes: 2
Views: 880
Reputation: 73
Did you try to solve the task without using multiple spiders?
Try to just add all the URLs to 'start_urls' list or get the list of URLs from file in 'start_requests' method and adjust the level of concurrency using Srapy`s settings such as 'CONCURRENT_REQUESTS' and 'CONCURRENT_ITEMS', like:
custom_settings = {
'CONCURRENT_REQUESTS': '1000',
'CONCURRENT_ITEMS': '10000'
}
Or something which fits your tasks more.
P.S. Generating a number of Scrapy spiders from the list of URLs and running them concurrently with scrapy-deploy (http://scrapyd.readthedocs.io/en/stable/) is also an option, although it looks like a little dirty as for me.
Upvotes: 1