how to know in pipeline if spider is closed

Question

I wrote the following pipeline so that the extracted items are directly into an excel file, I ran the spider without any errors however the file isn't being saved, I know that it is missing workbook.close(), problem is I do not know where to put it inside the code.

from datetime import datetime
import xlsxwriter
ordered_list=['Link','Price','Date','discount']

class guiPipeline(object):

    def __init__(self):
        now = datetime.now()
        workbook = xlsxwriter.Workbook('data.xlsx')
        self.worksheet = workbook.add_worksheet()
        self.write_first_row()
        self.index = 1
    def process_item(self, item, spider):
        for _key,_value in item.items():
            col=ordered_list.index(_key)
            self.worksheet.write(self.index,col,_value)
        self.index+=1

        return item

    def write_first_row(self):
        for header in ordered_list:
            col=ordered_list.index(header)
            self.worksheet.write(0,col,header)

This is my pipeline, I just need to know how to close() the workbook when the spider is finished

Ryan · Accepted Answer

You have some methods that can be called when the spider is opened or closed: http://doc.scrapy.org/en/latest/topics/item-pipeline.html#close_spider

You can find an example in the docs here: http://doc.scrapy.org/en/latest/topics/item-pipeline.html#write-items-to-a-json-file

You will also have to add this crawler.signals.connect(self.close_spider, signals.spider_closed) to your def __init__

how to know in pipeline if spider is closed

Answers (1)

Related Questions