dcarlo56ave
dcarlo56ave

Reputation: 253

Scrapy Spider Close

I have a script that I need to run after my spider closes. I see that Scrapy has a handler called spider_closed() but what I dont understand is how to incorporate this into my script. What I am looking to do is once the scraper is done crawling I want to combine all my csv files them load them to sheets. If anyone has any examples of this can be done that would be great.

Upvotes: 2

Views: 4126

Answers (2)

malberts
malberts

Reputation: 2536

As per the comments on my other answer about a signal-based solution, here is a way to run some code after multiple spiders are done. This does not involve using the spider_closed signal.

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings


process = CrawlerProcess(get_project_settings())
process.crawl('spider1')
process.crawl('spider2')
process.crawl('spider3')
process.crawl('spider4')
process.start()

# CSV combination code goes here. It will only run when all the spiders are done.
# ...

Upvotes: 1

malberts
malberts

Reputation: 2536

As per the example in the documentation, you add the following to your Spider:

# This function remains as-is.
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
    spider = super().from_crawler(crawler, *args, **kwargs)
    crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
    return spider

# This is where you do your CSV combination.
def spider_closed(self, spider):
    # Whatever is here will run when the spider is done.
    combine_csv_to_sheet()

Upvotes: 5

Related Questions