Use gzip with archive with multiple files in Python 3

Question

So basically I have a file system like this:

main_archive.tar.gz
  main_archive.tar
    sub_archive.xml.gz
      actual_file.xml

There are hundreds of files in this archive... So basically, can the gzip package be used with multiple files in Python 3? I've only used it with a single file zipped so I'm at a loss on how to go over multiple files or multiple levels of "zipping".

My usual method of decompressing is:

with gzip.open(file_path, "rb") as f:
  for ln in f.readlines():
    *decode encoding here*

Of course, this has multiple problems because usually "f" is just a file... But now I'm not sure what it represents?

Any help/advice would be much appreciated!

EDIT 1:

I've accepted the answer below, but if you're looking for similar code, my backbone was basically:

tar = tarfile.open(file_path, mode="r")
for member in tar.getmembers():
    f = tar.extractfile(member)
    if verbose:
        print("Decoding", member.name, "...")
    with gzip.open(f, "rb") as temp:
        decoded = temp.read().decode("UTF-8")
        e = xml.etree.ElementTree.parse(decoded).getroot()
        for child in e:
            print(child.tag)
            print(child.attrib)
            print("

")

tar.close()

Main packages used were gzip, tarfile, and xml.etree.ElementTree.

tripleee · Accepted Answer

gzip only supports compressing a single file or stream. In your case, the extracted stream is a tar object, so you'd use Python's tarfile library to manipulate the extracted contents. This library actually knows how to cope with .tar.gz so you don't need to explicitly extract the gzip yourself.

Use gzip with archive with multiple files in Python 3

Answers (2)

Related Questions