Macryo
Macryo

Reputation: 188

Java ZipInputStream throws zip.ZipException: invalid distance too far back while parsing nested zip files

I'll start of with acknowledgment that I've read several threads here and on the internet and my problem persists and seems to be something different.

I have a zip file which contains several .txt files, directories, subdirectories to this directories and so on. There also plenty zip archives inside with zips, directories and files inside. Deepest level of archivization is 5 steps -> 5 zips, one inside of another with different files along with them.

I have this code:

ZipFile zipFile = new ZipFile(Objects.requireNonNull(this.classLoader.getResource("inputs.zip")).getFile());
    Enumeration<? extends ZipEntry> entries = zipFile.entries();
    while (entries.hasMoreElements()) {
        ZipEntry entry = entries.nextElement();
        InputStream stream = zipFile.getInputStream(entry);
        System.out.println(entry.getName());
        processZipFile(stream);
    }

and here's processZipFile:

private void processZipFile(InputStream stream) throws IOException {
    ZipInputStream zipInputStream = new ZipInputStream(stream);
    ZipEntry zipEntry = zipInputStream.getNextEntry();
    while (zipEntry != null) {
        System.out.print("    /" + zipEntry.getName());
        if (zipEntry.getName().endsWith(".zip")) {
            processZipFile(stream);
        }
        zipEntry = zipInputStream.getNextEntry();
    }

Until level 3 of archivization everything seems to be working fine, all the directories, zips, gzips and sub directories are listed but then when it comes to handle something like inputs.zip/1.zip/2.zip it throws exception

Exception in thread "main" java.util.zip.ZipException: invalid distance too far back

As I have read in Java 8 docs ZipInputStream.getNextEntry(): Reads the next ZIP file entry and positions the stream at the beginning of the entry data. Because just after getting entry programs throws exception.

In this certain case, file inside of "2.zip" is rather big - 800 MB comparing it to other cases with max size of 3 MB - I wonder if it might affect program.

I'm trying to do all of these things without unpacking these zips, it's really important here. I'm aware this kind of error is commonly related to corrupted zip files, but these ones are totally legitimate.

So my question is - how I can go through all of these nested zip files?

EDIT/SOLUTION:

According to change proposed by Talex I've fixed my code to work on ZipInputStreams rather than standard InputStreams. It was not throwing errors anymore but somehow it was still skipping nested zips deeper than 3 levels of archivization (still not sure if it's proper naming approach lol). Solution to this was also simple - I wrapped ZipInputStream to another ZipInputStream when passing it recurrently to my function. Here's code:

private void processZipFile(ZipInputStream zipInputStream) throws IOException {
    ZipEntry zipEntry;
    while ((zipEntry = zipInputStream.getNextEntry()) != null) {
        System.out.println("    " + zipEntry.getName());
        if (zipEntry.getName().endsWith(".zip")) {
            processZipFile(new ZipInputStream(zipInputStream));
        } else if (zipEntry.getName().endsWith(".txt")) {
           //other things to todo...
        }
        //other things to todo...
    }

Upvotes: 0

Views: 4224

Answers (1)

talex
talex

Reputation: 20455

Instead of

processZipFile(stream);

you need to use

processZipFile(zipInputStream);

Upvotes: 4

Related Questions