Ben Xu
Ben Xu

Reputation: 1319

how to open java program generated zip file using UTF-8 encoding

Our product has an export function, which uses ZipOutputStream to zip a directory; however, when you try to zip a directory that contains file names with Chinese or Japanese character the export doesn't work properly. For some reason the new files in the zipped file are named differently. Here is an example of our zipping code:

ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipFileName));
out.setEncoding("UTF-8");
//program to add directory to zip 
//program add/create file to zip
out.close();

My import algorithm, also built in Java, can import the zipped file correctly, even if it contains Chinese/Japanese characters in file/directory names.

 Zipfile zipfile = new ZipFile(zipPath, "UTF-8");
 Enumeration e = zipFile.getEntries();
 while (e.hasMoreElements()) {
 entry = (ZipEntry) e.nextElement();
 String name = entry.getName();
         ....

Is the zip software's program having trouble unzipping the UTF-8 encoded files, or is there something special needed to create a zip file that can be easily used by existing software using utf-8 encoding??


I have written an example program:

package ZipFile;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.tools.zip.ZipEntry;
import org.apache.tools.zip.ZipOutputStream;

public class ZipFolder{
public static void main(String[] a) throws Exception
{
String srcFolder = "D:/9.4_work/openscript_repo/中文124.All/中文";
String destZipFile = "D:/Eclipse_Projects/OpenScriptDebuggingProject/src/ZipFile/demo.zip";
zipFolder(srcFolder, destZipFile);
}

static public void zipFolder(String srcFolder, String destZipFile) throws Exception
{
    ZipOutputStream zip = null;
    FileOutputStream fileWriter = null;

    fileWriter = new FileOutputStream(destZipFile);
    zip = new ZipOutputStream(fileWriter);
    zip.setEncoding("UTF-8");
    // using GBK encoding, the chinese name can be correctly displayed when unzip
    // zip.setEncoding("GBK");

    addFolderToZip("", srcFolder, zip);
    zip.flush();
    zip.close();
}

static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception
{

    File folder = new File(srcFile);
    if (folder.isDirectory()) {
        addFolderToZip(path, srcFile, zip);
    }
    else {
        byte[] buf = new byte[1024];
        int len;
        FileInputStream in = new FileInputStream(srcFile);
        zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
        while ((len = in.read(buf)) > 0) {
            zip.write(buf, 0, len);
        }
    }
}

static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception
{
    File folder = new File(srcFolder);

    for (String fileName : folder.list()) {
        if (path.equals("")) {
            addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
        }
        else {
            addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);
        }
    }
}

}

Upvotes: 4

Views: 15069

Answers (2)

sathish
sathish

Reputation: 11

The following utility class allows you to compress and decompress strings using the GZIP compression algorithm. This can be useful if you want to save long strings in a database for example.

import java.io.ByteArrayOutputStream;
import java.io.ByteArrayInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.GZIPInputStream;


public class GzipStringUtil {


    public static byte[] compressString(String uncompressedString) throws IllegalArgumentException, IllegalStateException {
        if (uncompressedString == null) {
            throw new IllegalArgumentException("The uncompressed string specified was null.");
        }
        try {
            byte[] utfEncodedBytes = uncompressedString.getBytes("UTF-8");
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            GZIPOutputStream gzipOutputStream = new GZIPOutputStream(baos);
            gzipOutputStream.write(utfEncodedBytes);
            gzipOutputStream.finish();
            gzipOutputStream.close();
            return baos.toByteArray();
        }
        catch (Exception e) {
            throw new IllegalStateException("GZIP compression failed: " + e, e);
        }
    }


    public static String uncompressString(byte[] compressedString) throws IllegalArgumentException, IllegalStateException {
        if (compressedString == null) {
            throw new IllegalArgumentException("The compressed string specified was null.");
        }
        try {
            ByteArrayInputStream bais = new ByteArrayInputStream(compressedString);
            GZIPInputStream gzipInputStream = new GZIPInputStream(bais);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            for (int value = 0; value != -1;) {
                value = gzipInputStream.read();
                if (value != -1) {
                    baos.write(value);
                }
            }
            gzipInputStream.close();
            baos.close();
            return new String(baos.toByteArray(), "UTF-8");
        }
        catch (Exception e) {
            throw new IllegalStateException("GZIP uncompression failed: " + e, e);
        }
    }
}

Here is a TestCase which provides example use of the class above:

public class GzipStringUtilTest extends TestCase {

    public void testGzipStringUtil() {
        String input = "This is a test. This is a test. This is a test. This is a test. This is a test.";
        System.out.println("Input:        [" + input + "]");
        byte[] compressed = GzipStringUtil.compressString(input);
        System.out.println("Compressed:   " + Arrays.toString(compressed));
        System.out.println("-> Compressed input string of length " + input.length() + " to " + compressed.length + " bytes");
        String uncompressed = GzipStringUtil.uncompressString(compressed);
        System.out.println("Uncompressed: [" + uncompressed + "]");
        assertEquals("The uncompressed string [" + uncompressed + "] unexpectedly does not match the input string [" + input + "]", input, uncompressed);
        System.out.println("The input was compressed and uncompressed successfully, and the input matches uncompressed output.");
    }
}

Upvotes: 1

Nick H
Nick H

Reputation: 11535

The top answer here may answer your question; unfortunately it seems to suggest that the Zip format doesn't really allow for creating a Zip file that will display filenames properly on any computer:

https://superuser.com/questions/60379/linux-zip-tgz-filenames-encoding-problem

I expect it works when you set encoding to GBK, because that is your system's default encoding and so 7zip is using that for all zip files it opens.

It suggests that rar and 7z formats have better support.

I found a blog entry specifically about UTF-8 in zips with Java. It suggests there's a newer version of the ZIP specification which the current versions of Java may not be creating, but Java 7 will do. I don't know if the Apache classes use this too.

http://blogs.oracle.com/xuemingshen/entry/non_utf_8_encoding_in

Upvotes: 1

Related Questions