merlin
merlin

Reputation: 2897

How to close writing of json file after crawling with scrapy?

I am crawling with scrapy 1.5.1 by calling it from the CLI:

scrapy crawl test -o data/20181204_test.json -t json 

My pipeline is pretty simple where I process the item and after processing I want to pull it into a zip archive within the close_spider method:

class BidPipeline(object):
    def process_item(self, item, spider):
        return item
    def close_spider(self, spider):
        # trying to close the writing of the file
        self.exporter.finish_exporting()
        self.file.close()
        # zip the img and json files into an archive
        cleanup('test')

cleanup method:

def cleanup(name):
    # create zip archive with all images inside
    filename = '../zip/' + datetime.datetime.now().strftime ("%Y%m%d-%H%M") + '_' + name
    imagefolder = 'full'
    imagepath = '/Users/user/test_crawl/bid/images'
    shutil.make_archive(
        filename, 
        'zip', 
        imagepath,
        imagefolder
    ) 
    # delete images
    shutil.rmtree(imagepath+ '/' + imagefolder)

    # add csv file to  zip archive
    filename_zip = filename + '.zip'
    zip = zipfile.ZipFile(filename_zip,'a') 
    path_to_file = '/Users/user/test_crawl/bid/data/'+  datetime.datetime.now().strftime ("%Y%m%d") + '_' + name + '.json'
    zip.write(path_to_file, os.path.basename(path_to_file)) 
    zip.close()

The traceback after using self.file.close():

AttributeError: 'BidPipeline' object has no attribute 'exporter'
2018-12-04 06:03:48 [scrapy.extensions.feedexport] INFO: Stored json feed (173 items) in: data/20181204_test.json

Withou file.close there is no traceback error and it apears OK at first, but the json file gets truncated.

End of decompressed file from zip archive with json file output from scrapy:

..a46.jpg"]},

json file output by scrapy:

a46.jpg"]}]

How do I close the writing of the file in order to zip it?

Upvotes: 0

Views: 276

Answers (1)

Guillaume
Guillaume

Reputation: 1879

Try removing this line self.exporter.finish_exporting().

Your object does not have a exporter attribute.

Upvotes: 1

Related Questions