AKS
AKS

Reputation: 63

Checking if a stream is a zip file

We have a requirement to determine whether an incoming InputStream is a reference to an zip file or zip data. We do not have reference to the underlying source of the stream. We aim to copy the contents of this stream into an OutputStream directed at an alternate location.

I tried reading the stream using ZipInputStream and extracting a ZipEntry. The ZipEntry is null if the stream is a regular file - as expected - however, in checking for a ZipEntry I loose the initial couple of bytes from the stream. Hence, by the time I know that the stream is a regular stream, I have already lost initial data from the stream.

Any thoughts around how to check if the InputStream is an archive without data loss would be helpful.

Thanks.

Upvotes: 6

Views: 10418

Answers (5)

User0
User0

Reputation: 564

This is how I did it.

Using mark/reset to restore the stream if the GZIPInputStream detects incorrect zip format (throws the ZipException).

/**
 * Wraps the input stream with GZIPInputStream if needed. 
 * @param inputStream
 * @return
 * @throws IOException
 */
private InputStream wrapIfZip(InputStream inputStream) throws IOException {
    if (!inputStream.markSupported()) {
        inputStream = new BufferedInputStream(inputStream);
    }
    inputStream.mark(1000);
    try {
        return new GZIPInputStream(inputStream);
    } catch (ZipException e) {
        inputStream.reset();
        return inputStream;
    }
}

Upvotes: 3

Nickolay Olshevsky
Nickolay Olshevsky

Reputation: 14160

You can check first bytes of stream for ZIP local header signature (PK 0x03 0x04), that would be enough for most cases. If you need more precision, you should take last ~100 bytes and check for central directory locator fields.

Upvotes: 2

Piskvor left the building
Piskvor left the building

Reputation: 92752

You have described a java.io.PushbackInputStream - in addition to read(), it has an unread(byte[]) which allows you push them bck to the front of the stream, and to re-read() them again.

It's in java.io since JDK1.0 (though I admit I haven't seen a use for it until today).

Upvotes: 1

Kim Burgaard
Kim Burgaard

Reputation: 3538

It sounds a bit like a hack, but you could implement a proxy java.io.InputStream to sit between ZipInputStream and the stream you originally passed to ZipInputStream's constructor. Your proxy would stream to a buffer until you know whether it's a ZIP file or not. If not, then the buffer saves your day.

Upvotes: 0

Galactus
Galactus

Reputation: 826

Assuming your original inputstream is not buffered, I would try wrapping the original stream in a BufferedInputStream, before wrapping that in a ZipInputStream to check. You can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.

Upvotes: 6

Related Questions