Reputation: 3264
Once the contents of a gzip file is extracted into a string called text
, it looks like gibberish. How can I turn it into something human-readable?
with open("zipped_ex.gz.2016") as f:
text = f.read()
print text
Note: I'm not searching for a way to go from zipper_ex_gz.2016 to the contents. Instead, I'm searching for a way to go from the bytestring to the contents.
Upvotes: 1
Views: 6786
Reputation: 23480
import gzip
with gzip.GzipFile("zipped_ex.gz.2016") as f:
text = f.read()
print text
On the disk, the file is a binary blop that is humanly unreadable.
To work with the data inside the archive you need to some how extract it.
In this case, in memory via the GzipFile
module that decompresses the archive "on the fly" so when you do f.read()
you get the archive contents, not the binary content that is the archive on your disk.
The same module can be used on a bytes
string:
import io
import gzip
f = io.BytesIO(b"Your compressed gzip-file content here")
with gzip.GzipFile(fileobj=f) as fh:
plain_text = fh.read()
print(plain_text)
Note: gzip
files are in fact a single data unit compressed with the gzip format, obviously. But if you want to work with a tar file within the gzip file if you have numerous text files compressed via tar, have a look at this question: How do I compress a folder with the Python GZip module?
Upvotes: 4