samb
samb

Reputation: 1745

python zipfile encoding for arcname

I'm trying to add several files to a zip with Python's zipfile library. The problem is in the filename that is zipped, which contains special characters (utf-8).

Here is a basic code :

#!/usr/bin/env python

import zipfile

infilename = "test_file"
outfilename = "test.zip"
filename = u'Conf\xe9d\xe9ration.txt'

if __name__ == '__main__':
    f = open(outfilename, "w")
    archive = zipfile.ZipFile(f, "w", zipfile.ZIP_DEFLATED)
    archive.write(infilename, filename.encode("CP437"))
    archive.close()
    f.close()

The file generated is not correctly read with every zip extractor :

I tried without encoding to CP437 changing just one line to :

    archive.write(infilename, filename)

This time Ubuntu has still the same problem, Windows gives "Conf+®d+®ration.txt" and MacOSX works perfectly.

Someone knows a (pythonic) cross-plateform solution?

Upvotes: 2

Views: 3239

Answers (1)

Nickolay Olshevsky
Nickolay Olshevsky

Reputation: 14160

Looks like file name is written "as it is" (i.e. first time it is written in CP437 encoding, and second - in UTF8), while other archive handlers use different approach:

  • Windows : it uses DOS/OEM encoding for file names inside of archive, that's why CP437 works. And, this behavior is described in PKWare standard;
  • Mac OS : it silently uses utf-8, which violates standard. And that's why utf8 works in Mac OS.
  • Linux/Unix: they use system code page for file names inside of archive, don't know to which one your Linux installation is configured, but not for DOS, and not for UTF8 encoding :)

Upvotes: 1

Related Questions