svc
svc

Reputation: 41

How can I check if a file exists in a tar archive with Python?

I would like to verify the existence of a given file in a tar archive with Python before I get it as a file-like object. I've tried it with isreg(), but probably I do something wrong.

How can I check if a file exists in a tar archive with Python?

I tried

import tarfile


tar = tarfile.open("sample.tar", "w")
tar.add("test1.txt")
tar.add("test2.txt")
tar.add("test3.py")
tar.close()

tar = tarfile.open("sample.tar", "r")
tai = tar.tarinfo(name="test3.py")
print(tai.isreg())
print(tai.size())
tar.close()

Probably tai is wrong. In fact tai.size() is always 0.

Upvotes: 4

Views: 6614

Answers (5)

Paolo Rovelli
Paolo Rovelli

Reputation: 9697

To retrieve all the files inside a tar archive you can use either the getmembers() or the getnames() methods of a TarFile object. Then, to extract them, you can use either the extract() or extractfile() methods.

For example:

# Archive: "sample.tar" >> Content: "test1.txt", ...
filename = "test1.txt"
with tarfile.open("sample.tar", "r") as tar:
    if filename in tar.getnames():
        file = tar.extractfile(filename).read()

But take in mind that the names returned are actually relative file paths. Meaning that, if the "test1.txt" file you're looking for is stored in a "test" sub-directory inside the tar archive, than its TarInfo.name will actually be "test/test1.txt".

So, going back to the previous example, you should do something like:

# Archive: "sample.tar" >> Content: "test", "test/test1.txt", ...
filename = "test1.txt"
with tarfile.open("sample.tar", "r") as tar:
    for name in tar.getnames():
        if name.endswith(filename):
            file = tar.extractfile(name).read()

Finally, to test it, you can use @patch() to mock the tarfile.open().

For example:

import unittest
from unittest.mock import patch

class TestTarfile(unittest.TestCase):
    @patch('myfile.tarfile.open')
    def test_tarfile_open(self, mock_open):
        mock_open.return_value.__enter__.return_value.getnames.return_value = [
            "test",
            "test/test1.txt"
        ]

NOTE: As stated in the documentation, the support for using TarFile objects as context managers in with statements was added starting from version 3.2.

Upvotes: 0

Tim McNamara
Tim McNamara

Reputation: 18385

If you really need to check, then you can test for membership using the getnames method and the in operator:

>>> import tarfile
>>> tar = tarfile.open("sample.tar", "w")
>>> "sample.tar" in tar.getnames()
True

However, I think that in Python (and dealing with file systems in general), catching exceptions are preferred. It's better to attempt to read and catch an exception because things can always happen between checking a file's existence and reading it later.

>>> try:
...     tar.getmember('contents.txt')
... except KeyError:
...     pass
...

Upvotes: 7

tzot
tzot

Reputation: 96081

This matches even if the tar file has the filename in a subdirectory, and uses normcase to mimic the filename case handling of the current OS (e.g. on Windows, searching for “readme.txt” should match “README.TXT” inside the tar file).

def filename_in_tar(filename, atarfile):
    filename= os.path.normcase(filename)
    return any(
        filename == os.path.normcase(os.path.basename(tfn))
        for tfn in atarfile.getnames())

Upvotes: 0

David Dean
David Dean

Reputation: 7701

You can use tar.getnames() and the in operator to do it:

$ touch a.txt
$ tar cvf a.tar a.txt
$ python
>>> names = tarfile.open('a.tar').getnames()
>>> 'a.txt' in names
True
>>> 'b.txt' in names
False

Upvotes: 0

Steve Tjoa
Steve Tjoa

Reputation: 61124

Maybe use getnames()?

tar = tarfile.open('sample.tar','r')
if 'test3.py' in tar.getnames():
    print 'test3.py is in sample.tar'

Upvotes: 0

Related Questions