Reputation: 1169
I'm struggling to handle nested zip files in Go (where a zip file contains another zip file). I'm trying to recurse a zip file and list all of the files it contains.
archive/zip gives you two methods for handling a zip file:
OpenReader
opens a file on disk. NewReader
accepts an io.ReaderAt
and a file size. As you iterate through the zipped files with either of these, you get out a zip.File
for each file inside the zip. To get the file contents of file f, you call f.Open
which gives you a zip.ReadCloser
. To open a nested zip file, I'd need to use NewReader
, but zip.File
and zip.ReadCloser
do not satisfy the io.ReaderAt
interface.
zip.File
has a private field zipr
which is an io.ReaderAt
and zip.ReadCloser
has a private field f
which is an os.File
which should satisfy the requirements for NewReader
.
My question: is there any way to open a nested zip file without first writing the contents to a file on disk, or reading the whole thing into memory.
It looks like everything that is needed is available in zip.File, but isn't exported. I'm hoping I missed something.
Upvotes: 3
Views: 2352
Reputation: 154543
I ran into the exact same need and came up with the following approach, not sure if its any help to you:
// NewZipFromReader ...
func NewZipFromReader(file io.ReadCloser, size int64) (*zip.Reader, error) {
in := file.(io.Reader)
if _, ok := in.(io.ReaderAt); ok != true {
buffer, err := ioutil.ReadAll(in)
if err != nil {
return nil, err
}
in = bytes.NewReader(buffer)
size = int64(len(buffer))
}
reader, err := zip.NewReader(in.(io.ReaderAt), size)
if err != nil {
return nil, err
}
return reader, nil
}
So if file
doesn't implement io.ReaderAt
it reads the whole contents into a buffer.
It's probably not safe to handle ZIP bombs, and will defenitely fail with OOM for files larger than RAM.
Upvotes: 0
Reputation: 9458
How about an io.ReaderAt
from an io.Reader
that reinitializes if you decided to go backwards: (this code is largely untested, but hopefully you get the idea)
package main
import (
"io"
"io/ioutil"
"os"
"strings"
)
type inefficientReaderAt struct {
rdr io.ReadCloser
cur int64
initer func() (io.ReadCloser, error)
}
func newInefficentReaderAt(initer func() (io.ReadCloser, error)) *inefficientReaderAt {
return &inefficientReaderAt{
initer: initer,
}
}
func (r *inefficientReaderAt) Read(p []byte) (n int, err error) {
n, err = r.rdr.Read(p)
r.cur += int64(n)
return n, err
}
func (r *inefficientReaderAt) ReadAt(p []byte, off int64) (n int, err error) {
// reset on rewind
if off < r.cur || r.rdr == nil {
r.cur = 0
r.rdr, err = r.initer()
if err != nil {
return 0, err
}
}
if off > r.cur {
sz, err := io.CopyN(ioutil.Discard, r.rdr, off-r.cur)
n = int(sz)
if err != nil {
return n, err
}
}
return r.Read(p)
}
func main() {
r := newInefficentReaderAt(func() (io.ReadCloser, error) {
return ioutil.NopCloser(strings.NewReader("ABCDEFG")), nil
})
io.Copy(os.Stdout, io.NewSectionReader(r, 0, 3))
io.Copy(os.Stdout, io.NewSectionReader(r, 1, 3))
}
If you mostly move forwards this probably works ok. Especially if you use a buffered reader.
io.ReaderAt
guarantees: https://godoc.org/io#ReaderFrom , namely it doesn't allow parallel calls to ReadAt
, and doesn't block on full reads, so this may not even work properlyUpvotes: 2