Convert results after CrawlerProcess ends

Question

I just finish a project with scrapy. My client wants results as xlsx and, cause I didn't find a way to export results like that, I'm exporting to csv and then converting xlxs (If this code can be improved, let me know :)

My problem is when python execute csv_2_xlsx(FILE_NAME) result file does not exist yet. I tried adding a sleep but din't work.

Any help will be welcome :)

My main file is like that:

# main.py
from scrapy.crawler import CrawlerProcess
from spiders import my_spider
from exporter import csv_2_xlsx

FILE_NAME = 'result.csv'

process = CrawlerProcess({
    'FEED_FORMAT': 'csv',
    'FEED_URI': FILE_NAME,
    'FEED_EXPORTERS' : {
        'csv': 'exporter.FixLineCsvItemExporter',
    }
})

process.crawl(my_spider.MySpider)

# I think python should stop until
# this process ends
process.start()

# this line is not working cause
# result.csv doest not exist yet
csv_2_xlsx(FILE_NAME)

ujhuyz0110 · Accepted Answer

Edited Version

I rearranged your code to the following manner in order to solve the csv file not being closed issue.

main.py

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings


process = CrawlerProcess(get_project_settings())

process.crawl('spider_name')
process.start()

Pipeline.py

from scrapy.exporters import CsvItemExporter
from exporter import csv_2_xlsx


FILE_NAME = 'result.csv'
class TutorialPipeline(object):
    def __init__(self):
        self.file = open(FILE_NAME, 'wb')
        self.exporter = CsvItemExporter(self.file)
        self.exporter.start_exporting()

    def close_spider(self, spider):
        self.exporter.finish_exporting()
        self.file.close()
        csv_2_xlsx(FILE_NAME)

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

Have you tried to add csv_2_xlsx(FILE_NAME) into the pipeline.py file? In the class definition of the pipeline.py file, add a close_spider() function and put csv_2_xlsx(FILE_NAME) into the function.

def close_spider(self, spider):
    csv_2_xlsx(FILE_NAME)

Convert results after CrawlerProcess ends

Answers (1)

Related Questions