Piotr Chowaniec
Piotr Chowaniec

Reputation: 1230

Creating zip with directory containing special characters

I'm trying to create a zip archive with some directories inside. Some of directories has Polish letters in the name like: ą, ę, ł, etc. Everything looks fine except that for any directory with special letter in name there is a another one created in the zip file. What is wrong with the following code:

import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException; 
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

public class Main {

  public static void main(String[] args) throws URISyntaxException, IOException {
    Map<String, String> env = new HashMap<>();
    env.put("create", "true");
    URI fileUri = new File("zipfs.zip").toPath().toUri();
    URI zipUri = new URI("jar:" + fileUri.getScheme(), fileUri.getPath(), null);

    try (FileSystem zipfs = FileSystems.newFileSystem(zipUri, env)) {

        Path directory = zipfs.getPath("ą");
        Files.createDirectory(directory);
        Path pathInZipfile = directory.resolve("someFile.txt");
        Path source = Paths.get("source.txt");

        Files.copy(source, pathInZipfile, StandardCopyOption.REPLACE_EXISTING);
    }

    FileSystem zipFs = FileSystems.newFileSystem(zipUri, Collections.emptyMap());

    Path root = zipFs.getPath("/");

    Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
        @Override
        public FileVisitResult visitFile(Path path, BasicFileAttributes attrs) throws IOException {
            System.out.println(path);
            return FileVisitResult.CONTINUE;
        }

        @Override
        public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
            System.out.println(dir);
            return super.preVisitDirectory(dir, attrs);
        }
    });
  }
}

The output of this program is as expected:

/
/ą/
/ą/someFile.txt

But when you open created zip file there are two directories inside:

Ä?  
ą

First one is empty and text file is as it should be in the 'ą' directory.

Upvotes: 3

Views: 1665

Answers (1)

David Duponchel
David Duponchel

Reputation: 4069

It seems ZipFileSystem doesn't set the Language encoding flag (EFS) with folders. This flag basically says "this path uses UTF-8".

Let's see with zipdetails (skipping not interesting lines):

0072 CENTRAL HEADER #1     02014B50
007A General Purpose Flag  0000                       // <= no EFS flag
00A0 Filename              'ą/'

00AC CENTRAL HEADER #2     02014B50
00B4 General Purpose Flag  0800
     [Bits 1-2]            0 'Normal Compression'
     [Bit 11]              1 'Language Encoding'      // <= EFS flag
00DA Filename              'ą/someFile.txt'

Otherwise, ą/ is correctly encoded in UTF-8.

Without this flag, it's up to the program reading/extracting the zip file to choose an encoding (usually the system default). unzip doesn't work well here:

$ unzip -t zipfs.zip 
Archive:  zipfs.zip
    testing: -à/                      OK
    testing: ą/someFile.txt          OK
No errors detected in compressed data of zipfs.zip.

Note, if you disable the unicode support with -UU, you get in both entries.

7z works better here (but only because my system default encoding is UTF-8):

$ 7z l zipfs.zip
...
   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2017-01-10 22:51:14 D....            0            0  ą
2017-01-10 22:51:15 .....            0            2  ą/someFile.txt
------------------- ----- ------------ ------------  ------------------------
2017-01-10 22:51:15                  0            2  1 files, 1 folders

If you can't force the way the zip file is opened (if the zip file is sent to users instead of one of your server for example) or only use ASCII characters in your folders, using a different library looks like the only solution.

Upvotes: 3

Related Questions