Shiva Krishna Bavandla
Shiva Krishna Bavandla

Reputation: 26648

Recording the total time taken for running a spider in scrapy

I am using scrapy to scrap a site

I had written a spider and fetched all the items from the page and saved to a csv file, and now i want to save the total execution time taken by scrapy to run the spider file, actually after spider execution is completed and when we have at at terminal it will display some results like starttime, endtime and so on .... so now in my program i need to calculate the total time taken by scrapy to run the spider and storing the total time some where....

Can anyone let me now how to do this through an example........

Thanks in advance...........

Upvotes: 6

Views: 3815

Answers (3)

SIM
SIM

Reputation: 22440

The easiest way I've found so far:

import scrapy

class StackoverflowSpider(scrapy.Spider):
    name = "stackoverflow"

    start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']

    def parse(self, response):
        for title in response.css(".summary .question-hyperlink::text").getall():
            yield {"Title":title}

    def close(self, reason):
        start_time = self.crawler.stats.get_value('start_time')
        finish_time = self.crawler.stats.get_value('finish_time')
        print("Total run time: ", finish_time-start_time)

Upvotes: 4

Erick Kondela
Erick Kondela

Reputation: 149

I'm quite a beginner but I did it in a bit simpler method and I hope it makes sense.

import datetime

then declare two global variables i.e self.starting_time and self.ending_time.

Inside the constructor of the spider class, set the starting time as

def __init__(self, name=None, **kwargs):
        self.start_time = datetime.datetime.now()

After that, use the closed method to find the difference between the ending and the starting. i.e

def closed(self, response):
   self.ending_time = datetime.datetime.now()
   duration = self.ending_time - self.starting_time
   print(duration)

That's pretty much of it. The closed method is called soon after the spider has ended the process. See the documentation here.

Upvotes: 1

warvariuc
warvariuc

Reputation: 59594

This could be useful:

from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.stats import stats
from datetime import datetime

def handle_spider_closed(spider, reason):
    print 'Spider closed:', spider.name, stats.get_stats(spider)
    print 'Work time:', datetime.now() - stats.get_stats(spider)['start_time']


dispatcher.connect(handle_spider_closed, signals.spider_closed)

Upvotes: 6

Related Questions