Everyone_Else
Everyone_Else

Reputation: 3264

Bytes from gzip file to text in python

Once the contents of a gzip file is extracted into a string called text, it looks like gibberish. How can I turn it into something human-readable?

with open("zipped_ex.gz.2016") as f:
    text = f.read()
    print text

Note: I'm not searching for a way to go from zipper_ex_gz.2016 to the contents. Instead, I'm searching for a way to go from the bytestring to the contents.

Upvotes: 1

Views: 6786

Answers (1)

Torxed
Torxed

Reputation: 23480

import gzip
with gzip.GzipFile("zipped_ex.gz.2016") as f:
    text = f.read()
    print text

On the disk, the file is a binary blop that is humanly unreadable.
To work with the data inside the archive you need to some how extract it.

In this case, in memory via the GzipFile module that decompresses the archive "on the fly" so when you do f.read() you get the archive contents, not the binary content that is the archive on your disk.

The same module can be used on a bytes string:

import io
import gzip

f = io.BytesIO(b"Your compressed gzip-file content here")
with gzip.GzipFile(fileobj=f) as fh:
    plain_text = fh.read()
    print(plain_text)

Note: gzip files are in fact a single data unit compressed with the gzip format, obviously. But if you want to work with a tar file within the gzip file if you have numerous text files compressed via tar, have a look at this question: How do I compress a folder with the Python GZip module?

Upvotes: 4

Related Questions