Reputation: 1230
I'm trying to create a zip archive with some directories inside. Some of directories has Polish letters in the name like: ą, ę, ł, etc. Everything looks fine except that for any directory with special letter in name there is a another one created in the zip file. What is wrong with the following code:
import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) throws URISyntaxException, IOException {
Map<String, String> env = new HashMap<>();
env.put("create", "true");
URI fileUri = new File("zipfs.zip").toPath().toUri();
URI zipUri = new URI("jar:" + fileUri.getScheme(), fileUri.getPath(), null);
try (FileSystem zipfs = FileSystems.newFileSystem(zipUri, env)) {
Path directory = zipfs.getPath("ą");
Files.createDirectory(directory);
Path pathInZipfile = directory.resolve("someFile.txt");
Path source = Paths.get("source.txt");
Files.copy(source, pathInZipfile, StandardCopyOption.REPLACE_EXISTING);
}
FileSystem zipFs = FileSystems.newFileSystem(zipUri, Collections.emptyMap());
Path root = zipFs.getPath("/");
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path path, BasicFileAttributes attrs) throws IOException {
System.out.println(path);
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
System.out.println(dir);
return super.preVisitDirectory(dir, attrs);
}
});
}
}
The output of this program is as expected:
/
/ą/
/ą/someFile.txt
But when you open created zip file there are two directories inside:
Ä?
ą
First one is empty and text file is as it should be in the 'ą' directory.
Upvotes: 3
Views: 1665
Reputation: 4069
It seems ZipFileSystem
doesn't set the Language encoding flag (EFS) with folders. This flag basically says "this path uses UTF-8".
Let's see with zipdetails
(skipping not interesting lines):
0072 CENTRAL HEADER #1 02014B50
007A General Purpose Flag 0000 // <= no EFS flag
00A0 Filename 'ą/'
00AC CENTRAL HEADER #2 02014B50
00B4 General Purpose Flag 0800
[Bits 1-2] 0 'Normal Compression'
[Bit 11] 1 'Language Encoding' // <= EFS flag
00DA Filename 'ą/someFile.txt'
Otherwise, ą/
is correctly encoded in UTF-8.
Without this flag, it's up to the program reading/extracting the zip file to choose an encoding (usually the system default). unzip
doesn't work well here:
$ unzip -t zipfs.zip
Archive: zipfs.zip
testing: -à/ OK
testing: ą/someFile.txt OK
No errors detected in compressed data of zipfs.zip.
Note, if you disable the unicode support with -UU
, you get -à
in both entries.
7z
works better here (but only because my system default encoding is UTF-8):
$ 7z l zipfs.zip
...
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2017-01-10 22:51:14 D.... 0 0 ą
2017-01-10 22:51:15 ..... 0 2 ą/someFile.txt
------------------- ----- ------------ ------------ ------------------------
2017-01-10 22:51:15 0 2 1 files, 1 folders
If you can't force the way the zip file is opened (if the zip file is sent to users instead of one of your server for example) or only use ASCII characters in your folders, using a different library looks like the only solution.
Upvotes: 3