Reputation: 44405
In python 3 (3.6.8) I want to read a gzipped tar file and list its content.
I found this solution which yields an error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Searching for this error in found this suggestion so I tried the following code snippet:
with open(out_file) as fd:
gzip_fd = gzip.GzipFile(fileobj=fd)
tar = tarfile.open(gzip_fd.read())
which yields the same error!
So how to do it right?
Even when looking at the actual documentation here I came up with the following code:
tar = tarfile.open(out_file, "w:gz")
for member in tar.getnames():
print(tar.extractfile(member).read())
which finally worked without errors - but did not print any content of the tar archive on the screen!
The tar file is well formatted and contains folders and files. (I need to try to share this file)
Upvotes: 1
Views: 4166
Reputation: 44405
Not sure why it did not work before, but the following solution works for me in order to list the files and folders of a gzipped tar archive with python 3.6:
tar = tarfile.open(filename, "r:gz")
print(tar.getnames())
Upvotes: 1
Reputation: 888
The python-archive module (available on pip) could help you:
from archive import extract
file = "you/file.tgz"
try:
extract(file, "out/%s.raw" % (file), ext=".tgz")
except:
# could not extract
pass
Available extensions are (v0.2): '.zip', '.egg', '.jar', '.tar', '.tar.gz', '.tgz', '.tar.bz2', '.tz2'
More info: https://pypi.org/project/python-archive/
Upvotes: 0
Reputation: 11151
When you open
a file without specifying mode
it defaults to reading it as text. You need to open the file as raw byte stream using mode='rb'
flag then feed it to gzip reader
with open(out_file, mode='rb') as fd:
gzip_fd = gzip.GzipFile(fileobj=fd)
tar = tarfile.open(gzip_fd.read())
Upvotes: 0