matei
matei

Reputation: 8705

What encoding does ZipArchive use to store file names inside the created archive?

I'm using the php ZipArchive class in order to generate a zip archive. I use the second parameter of the addFile method in order to set the name of the file in the archive (since the real file on disk has a different name). Some of the names must contain french accents (such as é). When I download the archive, the accents aren't correctly displayed in the file name. What encoding should I use for the file names ? (the application uses UTF-8)

Upvotes: 6

Views: 7194

Answers (4)

Oleksii Kuznietsov
Oleksii Kuznietsov

Reputation: 719

Use DOS encoding. My file names has cyrillic characters, so I'm encoding the file names from cp1251 (Windows) to cp866 (DOS), upon passing the filename to $zip->addFile().

EDIT 2024-02-20: here is what I'm doing to convert UTF-8 that contain Cyrillic characters into DOS chartable on Linux.

function utf8cp866($t) {
    if (stristr(PHP_OS, 'WIN')) return $t; // don't need to convert it on Windows.

    // fixing for Ukrainian "Ii" and quotes.
    return iconv('utf-8', 'cp866',
         str_replace('і', 'i', // Ukrainian i to latin. They look identically in unicode, but different characters. DOS cp866 table duesn’t support this character.
         str_replace('І', 'I',
         str_replace('"', '', // quotes can’t be unzipped with correct path :(
         str_replace('«', '', // these characters are not exist in DOS table
         str_replace('»', '',
         $t))))));
}

Upvotes: 4

Thanh Trung
Thanh Trung

Reputation: 3804

Depends on the Windows system e.g French, internal zip of Windows use IBM850 as encoding.

Upvotes: 2

janedbal
janedbal

Reputation: 156

It is php bug #53948, see official bug report.

Suggested workaround (worked for me):

$zip->addFile($file, iconv("UTF-8", "CP852", $local_name));

Upvotes: 9

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799082

Zip files don't have a specified encoding; the archive tool must guess (or assume) the encoding used. Try CP1252 first, then go from there.

Upvotes: 2

Related Questions