MJB
MJB

Reputation: 873

python - file duplicated when updating zip archive

I am trying to update a file in zip archive and save it as a new archive. The zip archive I work with is an excel .xlsm file and the file I need to modify is in subfolder: xl/vbaProject.bin. I wrote a function (by modifying the one posted here: How to update one file inside zip file using python).

def updateZip2(zip_name, file, data):
    # generate a temp file
    tmp = os.path.splitext(ntpath.basename(zip_name))[0] + '_new.xlsm'
    tmpname = str(pathlib.Path(zip_name).parent.joinpath(tmp))
    print(tmpname)

    with zipfile.ZipFile(zip_name, 'r') as zin:
        with zipfile.ZipFile(tmpname, 'w') as zout:
            zout.comment = zin.comment # preserve the comment
            for item in zin.infolist():
                if item.filename.find(file) == -1:
                    zout.writestr(item, zin.read(item.filename))

When I call this function like this: updateZip2('Book1.xlsm', r'xl\vbaProject.bin', target2) a new Book1_new.xlsm is created as expected, but I get the warning:

C:\ProgramData\Anaconda3\lib\zipfile.py:1355: UserWarning: Duplicate name: 'xl/vbaProject.bin'
  return self._open_to_write(zinfo, force_zip64=force_zip64)

and when I open the file with WinZip I can see vbaProject.bin is duplicated. Any ideas why and how to correct this behaviour to copy all files inside the zip except from xl\vbaProject.bin

Upvotes: 5

Views: 4154

Answers (1)

Martin Evans
Martin Evans

Reputation: 46779

The file that you are passing to updateZip2() is:

r'xl\vbaProject.bin'

but the files stored in the ZIP are of the form:

r'xl/vbaProject.bin'

So it should work if you change \ to / in your call:

updateZip2('Book1.xlsm', r'xl/vbaProject.bin', target2)

Alternatively you could update your equality test to:

if os.path.normpath(item.filename) != os.path.normpath(file):

Upvotes: 2

Related Questions