How to print the content of zipped gzip'd files

Question

Ok, so I have a zip file that contains gz files (unix gzip).

Here's what I do --

def parseSTS(file):
    import zipfile, re, io, gzip
    with zipfile.ZipFile(file, 'r') as zfile:
        for name in zfile.namelist():
            if re.search(r'\.gz$', name) != None:
                zfiledata = zfile.open(name)
                print("start for file ", name)
                with gzip.open(zfiledata,'r') as gzfile:
                    print("done opening")
                    filecontent = gzfile.read()
                    print("done reading")
                    print(filecontent)

This gives the following result --

>>> 
start for file  XXXXXX.gz
done opening
done reading

Then stays like that forever until it crashes ...

What can I do with filecontent?

Edit : this is not a duplicate since my gzipped files are in a zipped file and i'm trying to avoid extracting that zip file to disk. It works with zip files in a zip file as per How to read from a zip file within zip file in Python? .

selllikesybok · Accepted Answer

I created a zip file containing a gzip'ed PDF file I grabbed from the web.

I ran this code (with two small changes):

1) Fixed indenting of everything under the def statement (which I also corrected in your Question because I'm sure that it's right on your end or it wouldn't get to the problem you have).

2) I changed:

            zfiledata = zfile.open(name)
            print("start for file ", name)
            with gzip.open(zfiledata,'r') as gzfile:
                print("done opening")
                filecontent = gzfile.read()
                print("done reading")
                print(filecontent)

to:

            print("start for file ", name)
            with gzip.open(name,'rb') as gzfile:
                print("done opening")
                filecontent = gzfile.read()
                print("done reading")
                print(filecontent)

Because you were passing a file object to gzip.open instead of a string. I have no idea how your code is executing without that change, but it was crashing for me until I fixed it.

EDIT: Adding link to GZIP docs from James R's answer --

Also, see here for further documentation:

http://docs.python.org/2/library/gzip.html#examples-of-usage

END EDIT

Now, since my gzip'ed file is small, the behavior I observe is that is pauses for about 3 seconds after printing done reading, then outputs what is in filecontent.

I would suggest adding the following debugging line after your print "done reading" -- print len(filecontent). If this number is very, very large, consider not printing the entire file contents in one shot.

I would also suggest reading this for more insight into what I expect is your problem: Why is printing to stdout so slow? Can it be sped up?

EDIT 2 - an alternative if your system does not handle file io on zip files, causing no such file errors in the above:

def parseSTS(afile):
    import zipfile
    import zlib
    import gzip
    import io
    with zipfile.ZipFile(afile, 'r') as archive:
        for name in archive.namelist():
            if name.endswith('.gz'):
                    bfn = archive.read(name)
                    bfi = io.BytesIO(bfn)
                    g = gzip.GzipFile(fileobj=bfi,mode='rb')
                    qqq = g.read()
                    print qqq

parseSTS('t.zip')

How to print the content of zipped gzip'd files

Answers (2)

Related Questions

How to print the content of zipped gzip&#39;d files

Answers (2)

Related Questions

How to print the content of zipped gzip'd files