Reputation: 879
I just finish a project with scrapy. My client wants results as xlsx and, cause I didn't find a way to export results like that, I'm exporting to csv and then converting xlxs (If this code can be improved, let me know :)
My problem is when python execute csv_2_xlsx(FILE_NAME)
result file does not exist yet. I tried adding a sleep but din't work.
Any help will be welcome :)
My main file is like that:
# main.py
from scrapy.crawler import CrawlerProcess
from spiders import my_spider
from exporter import csv_2_xlsx
FILE_NAME = 'result.csv'
process = CrawlerProcess({
'FEED_FORMAT': 'csv',
'FEED_URI': FILE_NAME,
'FEED_EXPORTERS' : {
'csv': 'exporter.FixLineCsvItemExporter',
}
})
process.crawl(my_spider.MySpider)
# I think python should stop until
# this process ends
process.start()
# this line is not working cause
# result.csv doest not exist yet
csv_2_xlsx(FILE_NAME)
Upvotes: 0
Views: 560
Reputation: 383
Edited Version
I rearranged your code to the following manner in order to solve the csv file not being closed issue.
main.py
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
process.crawl('spider_name')
process.start()
Pipeline.py
from scrapy.exporters import CsvItemExporter
from exporter import csv_2_xlsx
FILE_NAME = 'result.csv'
class TutorialPipeline(object):
def __init__(self):
self.file = open(FILE_NAME, 'wb')
self.exporter = CsvItemExporter(self.file)
self.exporter.start_exporting()
def close_spider(self, spider):
self.exporter.finish_exporting()
self.file.close()
csv_2_xlsx(FILE_NAME)
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
Have you tried to add csv_2_xlsx(FILE_NAME)
into the pipeline.py
file? In the class definition of the pipeline.py
file, add a close_spider()
function and put csv_2_xlsx(FILE_NAME)
into the function.
def close_spider(self, spider):
csv_2_xlsx(FILE_NAME)
Upvotes: 1