Reputation: 26648
I am using scrapy to scrap a site
I had written a spider and fetched all the items from the page and saved to a csv file,
and now i want to save the total execution time
taken by scrapy to run the spider file,
actually after spider execution is completed and when we have at at terminal it will display some results
like starttime, endtime
and so on .... so now in my program i need to calculate the total time taken by scrapy to run the spider and storing the total time some where....
Can anyone let me now how to do this through an example........
Thanks in advance...........
Upvotes: 6
Views: 3815
Reputation: 22440
The easiest way I've found so far:
import scrapy
class StackoverflowSpider(scrapy.Spider):
name = "stackoverflow"
start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']
def parse(self, response):
for title in response.css(".summary .question-hyperlink::text").getall():
yield {"Title":title}
def close(self, reason):
start_time = self.crawler.stats.get_value('start_time')
finish_time = self.crawler.stats.get_value('finish_time')
print("Total run time: ", finish_time-start_time)
Upvotes: 4
Reputation: 149
I'm quite a beginner but I did it in a bit simpler method and I hope it makes sense.
import datetime
then declare two global variables i.e self.starting_time and self.ending_time.
Inside the constructor of the spider class, set the starting time as
def __init__(self, name=None, **kwargs):
self.start_time = datetime.datetime.now()
After that, use the closed method to find the difference between the ending and the starting. i.e
def closed(self, response):
self.ending_time = datetime.datetime.now()
duration = self.ending_time - self.starting_time
print(duration)
That's pretty much of it. The closed method is called soon after the spider has ended the process. See the documentation here.
Upvotes: 1
Reputation: 59594
This could be useful:
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.stats import stats
from datetime import datetime
def handle_spider_closed(spider, reason):
print 'Spider closed:', spider.name, stats.get_stats(spider)
print 'Work time:', datetime.now() - stats.get_stats(spider)['start_time']
dispatcher.connect(handle_spider_closed, signals.spider_closed)
Upvotes: 6