freb
freb

Reputation: 1169

Handling nested zip files with archive/zip

I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.

archive/zip gives you two methods for handling a zip file:

OpenReader opens a file on disk. NewReader accepts an io.ReaderAt and a file size. As you iterate through the zipped files with either of these, you get out a zip.File for each file inside the zip. To get the file contents of file f, you call f.Open which gives you a zip.ReadCloser. To open a nested zip file, I'd need to use NewReader, but zip.File and zip.ReadCloser do not satisfy the io.ReaderAt interface.

zip.File has a private field zipr which is an io.ReaderAt and zip.ReadCloser has a private field f which is an os.File which should satisfy the requirements for NewReader.

My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.

It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.

Upvotes: 3

Views: 2352

Answers (2)

Alix Axel
Alix Axel

Reputation: 154543

I ran into the exact same need and came up with the following approach, not sure if its any help to you:

// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
    in := file.(io.Reader)

    if _, ok := in.(io.ReaderAt); ok != true {
        buffer, err := ioutil.ReadAll(in)

        if err != nil {
            return nil, err
        }

        in = bytes.NewReader(buffer)
        size = int64(len(buffer))
    }

    reader, err := zip.NewReader(in.(io.ReaderAt), size)

    if err != nil {
        return nil, err
    }

    return reader, nil
}

So if file doesn't implement io.ReaderAt it reads the whole contents into a buffer.

It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.

Upvotes: 0

Caleb
Caleb

Reputation: 9458

How about an io.ReaderAt from an io.Reader that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)

package main

import (
    "io"
    "io/ioutil"
    "os"
    "strings"
)

type inefficientReaderAt struct {
    rdr    io.ReadCloser
    cur    int64
    initer func() (io.ReadCloser, error)
}

func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
    return &inefficientReaderAt{
        initer: initer,
    }
}

func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
    n, err = r.rdr.Read(p)
    r.cur += int64(n)
    return n, err
}

func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
    // reset on rewind
    if off < r.cur || r.rdr == nil {
        r.cur = 0
        r.rdr, err = r.initer()
        if err != nil {
            return 0, err
        }
    }

    if off > r.cur {
        sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
        n = int(sz)
        if err != nil {
            return n, err
        }
    }

    return r.Read(p)
}

func main() {
    r := newInefficentReaderAt(func() (io.ReadCloser, error) {
        return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
    })

    io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
    io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}

If you mostly move forwards this probably works ok. Especially if you use a buffered reader.

  • I should note that this violates the io.ReaderAt guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt, and doesn't block on full reads, so this may not even work properly

Upvotes: 2

Related Questions