Reputation: 2897
I am crawling with scrapy 1.5.1 by calling it from the CLI:
scrapy crawl test -o data/20181204_test.json -t json
My pipeline is pretty simple where I process the item and after processing I want to pull it into a zip archive within the close_spider method:
class BidPipeline(object):
def process_item(self, item, spider):
return item
def close_spider(self, spider):
# trying to close the writing of the file
self.exporter.finish_exporting()
self.file.close()
# zip the img and json files into an archive
cleanup('test')
cleanup method:
def cleanup(name):
# create zip archive with all images inside
filename = '../zip/' + datetime.datetime.now().strftime ("%Y%m%d-%H%M") + '_' + name
imagefolder = 'full'
imagepath = '/Users/user/test_crawl/bid/images'
shutil.make_archive(
filename,
'zip',
imagepath,
imagefolder
)
# delete images
shutil.rmtree(imagepath+ '/' + imagefolder)
# add csv file to zip archive
filename_zip = filename + '.zip'
zip = zipfile.ZipFile(filename_zip,'a')
path_to_file = '/Users/user/test_crawl/bid/data/'+ datetime.datetime.now().strftime ("%Y%m%d") + '_' + name + '.json'
zip.write(path_to_file, os.path.basename(path_to_file))
zip.close()
The traceback after using self.file.close():
AttributeError: 'BidPipeline' object has no attribute 'exporter'
2018-12-04 06:03:48 [scrapy.extensions.feedexport] INFO: Stored json feed (173 items) in: data/20181204_test.json
Withou file.close there is no traceback error and it apears OK at first, but the json file gets truncated.
End of decompressed file from zip archive with json file output from scrapy:
..a46.jpg"]},
json file output by scrapy:
a46.jpg"]}]
How do I close the writing of the file in order to zip it?
Upvotes: 0
Views: 276
Reputation: 1879
Try removing this line self.exporter.finish_exporting()
.
Your object does not have a exporter
attribute.
Upvotes: 1