marw
marw

Reputation: 3119

Make in-memory copy of a zip by iterating over each file of the input

Python, as far as know, does not allow modification of an archived file. That is why I want to:

  1. Unpack the zip in memory (zip_in).
  2. Go over each file in the zip_in, and change it if needed, then copy it to zip_out. For now I'm happy with just making a copy of a file.
  3. Save zip_out.

I was experimenting with zipfile and io but no luck. Partially because I'm not sure how all that works and which object requires which output.

Working Code

import os
import io
import codecs
import zipfile

# Make in-memory copy of a zip file
# by iterating over each file in zip_in
# archive.
#
# Check if a file is text, and in that case
# open it with codecs.

zip_in = zipfile.ZipFile(f, mode='a')
zip_out = zipfile.ZipFile(fn, mode='w')
for i in zip_in.filelist:
    if os.path.splitext(i.filename)[1] in ('.xml', '.txt'):
        c = zip_in.open(i.filename)
        c = codecs.EncodedFile(c, 'utf-8', 'utf-8').read()
        c = c.decode('utf-8')
    else:
        c = zip_in.read(i.filename)
    zip_out.writestr(i.filename, c)
zip_out.close()

Old Example, With a Problem

# Make in-memory copy of a zip file
# by iterating over each file in zip_in
# archive.
#
# This code below does not work properly.

zip_in = zipfile.ZipFile(f, mode='a')
zip_out = zipfile.ZipFile(fn, mode='w')
for i in zip_in.filelist:
    bc = io.StringIO() # what about binary files?
    zip_in.extract(i.filename, bc)
    zip_out.writestr(i.filename, bc.read())
zip_out.close()

The error is TypeError: '_io.StringIO' object is not subscriptable

Upvotes: 1

Views: 2046

Answers (1)

Matt Good
Matt Good

Reputation: 3127

ZipFile.extract() expects a filename, not a file-like object to write to. Instead, use ZipFile.read(name) to get the contents of the file. It returns the byte string so will work fine with binary files. Text files may require decoding to unicode.

Upvotes: 2

Related Questions