How to Specify different Process settings for two different spiders in CrawlerProcess Scrapy?

Question

I have two spiders which I want to execute in parallel. I used the CrawlerProcess instance and its crawl method to acheieve this. However, I want to specify different output file, ie FEED_URI for each spider in the same process. I tried to loop the spiders and run them as shown below. Though two different output files are generated, the process terminates as soon as the second spider completes execution. If the first spider completes crawling before the second one, I get the desired output. However, if the second spider finishes crawling first, then it doesn't wait for the first spider to complete. How could I actually fix this?

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess

setting = get_project_settings()
process = CrawlerProcess(setting)

for spider_name in process.spider_loader.list():
    setting['FEED_FORMAT'] = 'json'
    setting['LOG_LEVEL'] = 'INFO'
    setting['FEED_URI'] = spider_name+'.json'
    setting['LOG_FILE'] = spider_name+'.log'
    process = CrawlerProcess(setting)
    print("Running spider %s" % spider_name)
    process.crawl(spider_name)

process.start()
print("Completed")

How to Specify different Process settings for two different spiders in CrawlerProcess Scrapy?

Answers (1)

Related Questions