Reputation: 253
I have a script that I need to run after my spider closes. I see that Scrapy has a handler called spider_closed() but what I dont understand is how to incorporate this into my script. What I am looking to do is once the scraper is done crawling I want to combine all my csv files them load them to sheets. If anyone has any examples of this can be done that would be great.
Upvotes: 2
Views: 4126
Reputation: 2536
As per the comments on my other answer about a signal-based solution, here is a way to run some code after multiple spiders are done. This does not involve using the spider_closed
signal.
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl('spider1')
process.crawl('spider2')
process.crawl('spider3')
process.crawl('spider4')
process.start()
# CSV combination code goes here. It will only run when all the spiders are done.
# ...
Upvotes: 1
Reputation: 2536
As per the example in the documentation, you add the following to your Spider:
# This function remains as-is.
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super().from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
return spider
# This is where you do your CSV combination.
def spider_closed(self, spider):
# Whatever is here will run when the spider is done.
combine_csv_to_sheet()
Upvotes: 5