Reputation: 952
I have a web service processing contents of ZIP files which I receive from a network source and stream back to a network target on the fly. This works great for many about 60% of my test files, but 40% of them can't be processed, because zipEntry.getSize()
returns -1
as the file size of all zip entries.
Below you can see two java tests streaming contents from a source zip to a target zip. The first one accepts any InputStream
as a source (which is what I need, as I get my data directly from the network) and fails processing zip entries with unknown (-1
) size.
The second test knows how to handle entries with unknown (-1
) size, but can only handle streams originating from a local file (which is not what I need - it's only here to proof, that the zip files in question are not corrupt).
There's a lot of example online for handling local zip files - but very few about dealing with network streams, which is why I have a very hard time finding a solution to this.
The error thrown by the first example is Stream Zip files: java.util.zip.ZipException: invalid entry size (expected 0 but got 419 bytes)
Here's my code:
package de.ftk.threemf.mesh;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.IOUtils;
import org.junit.jupiter.api.Test;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Date;
import java.util.Enumeration;
import java.util.zip.*;
@Slf4j
public class ZipStreamTests {
@Test
public void generalizedStreamZipTest() throws IOException {
Path path = Path.of("testdata/brokenzip/trex.3mf");
InputStream in = Files.newInputStream(path);
OutputStream out = Files.newOutputStream(Path.of("testoutput/ziptest.3mf"));
ZipInputStream zipInputStream = new ZipInputStream(in);
ZipEntry zipEntry;
CheckedOutputStream checkedOutputStream = new CheckedOutputStream(out, new Adler32());
ZipOutputStream zipOutputStream = new ZipOutputStream(checkedOutputStream);
while ((zipEntry = zipInputStream.getNextEntry()) != null) {
log.info("zip file contains: {} modified on {}", zipEntry.getName(), new Date(zipEntry.getTime()));
zipOutputStream.putNextEntry(zipEntry);
log.info("expecting " + zipEntry.getSize() + " bytes");
IOUtils.copy(zipInputStream, zipOutputStream);
zipOutputStream.closeEntry();
zipInputStream.closeEntry();
}
zipInputStream.close();
zipOutputStream.finish();
zipOutputStream.close();
in.close();
out.close();
}
@Test
public void fileStreamZipTest() throws IOException {
ZipFile zipFile = new ZipFile("testdata/brokenzip/trex.3mf");
final ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("testoutput/ziptest.3mf"));
for (Enumeration<? extends ZipEntry> e = zipFile.entries(); e.hasMoreElements(); ) {
ZipEntry entryIn = e.nextElement();
log.info("zip file contains: {} modified on {}", entryIn.getName(), new Date(entryIn.getTime()));
ZipEntry zipEntry = new ZipEntry(entryIn.getName());
log.info("expecting " + zipEntry.getSize() + " bytes");
zos.putNextEntry(zipEntry);
InputStream is = zipFile.getInputStream(entryIn);
byte[] buf = new byte[1024];
int len;
while ((len = (is.read(buf))) > 0) {
zos.write(buf);
}
zos.closeEntry();
}
zos.close();
}
}
Hint: A 3MF
file is a ZIP
file containing 3D-Models.
Upvotes: 3
Views: 4313
Reputation: 176
Even as recent as jdk-1.8.0_341, there still seems to be a bug related to ZIP64 in ZipInputStream
.
It triggers under these conditions:
ZipInputStream
interprets the data descriptor as 32-bit despite the local file header being ZIP64 and therefore reads the file size incorrectly. The culprit is here in the readEnd()
method:
if ((flag & 8) == 8) {
/* "Data Descriptor" present */
if (inf.getBytesWritten() > ZIP64_MAGICVAL ||
inf.getBytesRead() > ZIP64_MAGICVAL) {
// ZIP64 format
...
A bug fix should look something like this:
if ((flag & 8) == 8) {
/* "Data Descriptor" present */
if (/* file header was ZIP64*/) {
// ZIP64 format
...
The exception is only thrown once all of the file data is read and ZipInputStream
attempts to close and verify the ZipEntry
sizes, and CRC. So the stream may be manually advanced using reflection to recover from the error.
ZipInputStream zipStream = ...;
zipStream.getNextEntry();
try {
// Read file data
zipStream.read(...);
} catch (ZipException e) {
// Recover from error condition
if (e.getMessage().startsWith("invalid entry size (expected 0 ")) {
// Grab pushback stream
Field inF = FilterInputStream.class.getDeclaredField("in"); inF.setAccessible(true);
PushbackInputStream in = (PushbackInputStream) inF.get(zipStream);
for (int i = 0; i < 8; i++) in.read(); // Read 8 extra bytes to compensate footer
// Close the entry manually
Field f = ZipInputStream.class.getDeclaredField("entryEOF");
f.setAccessible(true);
f.set(zipStream, true);
f = ZipInputStream.class.getDeclaredField("entry");
f.setAccessible(true);
f.set(zipStream, null);
} else {
throw e;
}
}
zipStream.getNextEntry(); // Continue as if exception hadn't occured
I created a complete packaged solution available at this repository. There are also some sample ZIPs under test resources that will trigger the bug.
https://github.com/cjgriscom/ZipInputStreamPatch64
Upvotes: 0
Reputation: 4743
This is related to ZIP64 subformat https://www.ibm.com/support/pages/zip-file-unreadable-cause-javautilzipzipexception-invalid-entry-size
Newer java7 and java8 releases have this fixed - jdk-1.8.0_91 is not ok, openjdk-1.8.0.212.b04 is ok.
Upvotes: 1