Matt Ball
Matt Ball

Reputation: 359786

Working with Zip and GZip files in Java

It's been a while since I've done Java I/O, and I'm not aware of the latest "right" ways to work with Zip and GZip files. I don't necessarily need a full working demo - I'm primarily looking for the right interfaces and methods to be using. Yes, I could look up any random tutorial on this, but performance is an issue (these files can get pretty big) and I do care about using the best tool for the job.

The basic process I'll be implementing:

The input files might be compressed and archived more than once. For example, the "full extraction" should take any of the following inputs (I'm not in control of these), and leave behind foo.txt:

Then, I might be left with foo.txt, bar.mp3, baz.exe - so I would just add them all to a new zip file with some generic name.

Questions:

Upvotes: 17

Views: 25280

Answers (3)

Aaron Novstrup
Aaron Novstrup

Reputation: 21017

Note that TrueZip, the library suggested below, has been superseded by TrueVFS.


I've found the TrueZIP library useful. It allows you to treat archive files as if they're just another file system and use the familiar Java I/O APIs.

Unlike the java.util.zip API, TrueZIP provides random access to the contents of the archive, so file size should not be a concern. If I remember correctly, it will detect archive files and not try to redundantly compress them when you put them into an archive.

Quoting the TrueZIP page:

The TrueZIP API provides drop-in replacements for the well-known classes File, FileInputStream and FileOutputStream. This design makes TrueZIP very simple to use: All that is required to archive-enable most client applications is to add a few import statements for the package de.schlichtherle.io and add some type casts where required.

Now you can simply address archive files like directories in a path name. For example, the path name "archive.zip/readme" addresses the archive entry readme within the ZIP file archive.zip. Note that file name suffixes are fully configurable and TrueZIP automatically detects false positives and reverts back to treat them like ordinary files or directories. This works recursively, so an archive file may even be enclosed in another archive file, like in outer.zip/inner.zip/readme.

Upvotes: 8

dogbane
dogbane

Reputation: 274562

Don't hold all this uncompressed data in memory, or you might run out of heap space. You need to stream the data out to file when uncompressing and then stream it back in from file when you want to create your final zip file.

I haven't done zipped files before, but here is an example which shows how to uncompress a gzipped file:

import java.io.*;
import java.util.zip.*;

//unzipping a gzipped file
GZIPInputStream in = null;
OutputStream out = null;
try {
   in = new GZIPInputStream(new FileInputStream("file.txt.gz"));
   out = new FileOutputStream("file.txt");
   byte[] buf = new byte[1024 * 4];
   int len;
   while ((len = in.read(buf)) > 0) {
       out.write(buf, 0, len);
   }
}
catch (IOException e) {
   e.printStackTrace();
}
finally {
   if (in != null)
       try {
           in.close();
       }
       catch (IOException ignore) {
       }
   if (out != null)
       try {
           out.close();
       }
       catch (IOException ignore) {
       }
}

Upvotes: 11

Powerlord
Powerlord

Reputation: 88796

There may be a library somewhere to make this easy.

However, if there isn't, you can still do it the hard way with the java.util.zip classes... using ZipFile or ZipInputStream, along with ZipEntry for zip.

GZIPInputStream can wrap a FileInputStream for gzip, keeping in mind that gzip only works on single files.

Both types of InputStreams also have their respective OutputStreams.

Unfortunately, although I know of these classes, I've never actually used them, so I can't advise you any more than that.

Edit: The Zip functions do not appear to have any method for adding new files to a zip file without recreating the entire thing.

Upvotes: 3

Related Questions