Bytes from gzip file to text in python

Question

Once the contents of a gzip file is extracted into a string called text, it looks like gibberish. How can I turn it into something human-readable?

with open("zipped_ex.gz.2016") as f:
    text = f.read()
    print text

Note: I'm not searching for a way to go from zipper_ex_gz.2016 to the contents. Instead, I'm searching for a way to go from the bytestring to the contents.

Torxed · Accepted Answer

import gzip
with gzip.GzipFile("zipped_ex.gz.2016") as f:
    text = f.read()
    print text

On the disk, the file is a binary blop that is humanly unreadable.
To work with the data inside the archive you need to some how extract it.

In this case, in memory via the GzipFile module that decompresses the archive "on the fly" so when you do f.read() you get the archive contents, not the binary content that is the archive on your disk.

The same module can be used on a bytes string:

import io
import gzip

f = io.BytesIO(b"Your compressed gzip-file content here")
with gzip.GzipFile(fileobj=f) as fh:
    plain_text = fh.read()
    print(plain_text)

Note: gzip files are in fact a single data unit compressed with the gzip format, obviously. But if you want to work with a tar file within the gzip file if you have numerous text files compressed via tar, have a look at this question: How do I compress a folder with the Python GZip module?

Bytes from gzip file to text in python

Answers (1)

Related Questions