Reputation: 771
So basically I have a file system like this:
main_archive.tar.gz
main_archive.tar
sub_archive.xml.gz
actual_file.xml
There are hundreds of files in this archive... So basically, can the gzip
package be used with multiple files in Python 3? I've only used it with a single file zipped so I'm at a loss on how to go over multiple files or multiple levels of "zipping".
My usual method of decompressing is:
with gzip.open(file_path, "rb") as f:
for ln in f.readlines():
*decode encoding here*
Of course, this has multiple problems because usually "f" is just a file... But now I'm not sure what it represents?
Any help/advice would be much appreciated!
EDIT 1:
I've accepted the answer below, but if you're looking for similar code, my backbone was basically:
tar = tarfile.open(file_path, mode="r")
for member in tar.getmembers():
f = tar.extractfile(member)
if verbose:
print("Decoding", member.name, "...")
with gzip.open(f, "rb") as temp:
decoded = temp.read().decode("UTF-8")
e = xml.etree.ElementTree.parse(decoded).getroot()
for child in e:
print(child.tag)
print(child.attrib)
print("\n\n")
tar.close()
Main packages used were gzip
, tarfile
, and xml.etree.ElementTree
.
Upvotes: 1
Views: 10071
Reputation: 189377
gzip
only supports compressing a single file or stream. In your case, the extracted stream is a tar
object, so you'd use Python's tarfile
library to manipulate the extracted contents. This library actually knows how to cope with .tar.gz
so you don't need to explicitly extract the gzip
yourself.
Upvotes: 4
Reputation: 112339
Use Python's tarfile to get the contained files, and then Python's gzip again inside the loop to extract the xml.
Upvotes: 0