gnomeAware
gnomeAware

Reputation: 93

python tarfile recursive extract in memory

I have a tar file with that contains compressed tar files. Like this:

gnomeAware@devserv:~$ tar tf test.tar
File1.tar.gz
File2.tar.gz
File3.tar.gz
File4.tar.gz

tarfile expects a string as the file to open. Is there anyway to pass it a file object?

tar = tarfile.open('test.tar', 'r') # Unpack tar
for item in tar:
  Bundle=tar.extractfile(item) # Pull out the file
  t = tarfile.open(Bundle, "r:gz") # Unpack tar
  for tItem in t:
  ...

Thanks.

Upvotes: 8

Views: 13614

Answers (2)

JulianWgs
JulianWgs

Reputation: 1059

Here is a way to read the data of each file in the archive:

import tarfile

filename = "archive.tar.gz"

with tarfile.open(filename, "r:gz") as file:
    # don't use file.members as it's 
    # not giving nested files and folders
    for member in file:
        # You need additional code to save the data into a list.
        file_content_byte = file.extractfile(member.name).read()

If you already know the name of the file in the archive you can do this:

import tarfile

filename = "archive.tar.gz"

with tarfile.open(filename, "r:gz") as file:
    file_content_byte = file.extractfile("file.txt").read()

Upvotes: 10

Ward
Ward

Reputation: 2852

the definition of tarfile.open looks like this def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):

And python documentation says that

If fileobj is specified, it is used as an alternative to a file object opened for name. It is supposed to be at position 0.

so, instead of calling it with positional argument, you can call it with a keyword argument. Pass a fileobj instead of the name.

import tarfile

f = open('archive.tar', 'rb')
print (f)
tar = tarfile.open(fileobj=f, mode='r:') # Unpack tar
for item in tar:
    print(item)

Upvotes: 8

Related Questions