Read multiple files from concatenated gzip in Python

Question

If I have a gzipped file and I concatenate it together with another gzipped file, is it possible to read the files separately in python?

Ex:

cat f1.csv.gz f2.csv.gz > f3.csv.gzip

I know this is possible in Go, but is there a way to do this in Python?

Mark Adler · Accepted Answer

Yes. Use z = zlib.decompressobj(31), and then use z to decompress until z.unused_data is not empty, or you have processed all of the input. If you get z.unused_data as not empty, then it contains the start of the next gzip stream. Create a new y = zlib.decompressobj object, and start decompression with the contents of z.unused_data, continuing with more data from the file.

This prints the uncompressed size of each concatenated gzip component:

#!/usr/bin/python
import sys
import zlib
z = zlib.decompressobj(31)
count = 0
while True:
    if z.unused_data == "":
        buf = sys.stdin.read(8192)
        if buf == "":
            break
    else:
        print count
        count = 0
        buf = z.unused_data
        z = zlib.decompressobj(31)
    got = z.decompress(buf)
    count += len(got)
print count

Read multiple files from concatenated gzip in Python

Answers (2)

Related Questions