hpixel
hpixel

Reputation: 260

Check tar archive before extractall

In the python documentation, it is adviced not to extract a tar archive without prior inspection. What is the best way to make sure an archive is safe using the tarfile python module? Should I just iterate over all the filename and check wether they contain absolute pathnames?

Would something like the following be sufficient?

import sys
import tarfile
with tarfile.open('sample.tar', 'r') as tarf:
    for n in tarf.names():
        if n[0] == '/' or n[0:2] == '..':
            print 'sample.tar contains unsafe filenames'
            sys.exit(1)
    tarf.extractall()

Edit

This script is not compatible with versions prior to 2.7. cf with and tarfile.

I now iterate over the members:

target_dir = "/target/"
with closing(tarfile.open('sample.tar', mode='r:gz')) as tarf:
    for m in tarf:
        pathn = os.path.abspath(os.path.join(target_dir, m.name))
        if not pathn.startswith(target_dir):
            print 'The tar file contains unsafe filenames. Aborting.'
            sys.exit(1)
        tarf.extract(m, path=tdir)

Upvotes: 4

Views: 1753

Answers (1)

David Wolever
David Wolever

Reputation: 154682

Almost, although it would still be possible to have a path like foo/../../.

Better would be to use os.path.join and os.path.abspath, which together will correctly handle leading / and ..s anywhere in the path:

target_dir = "/target/" # trailing slash is important
with tarfile.open(…) as tarf:
    for n in tarf.names:
        if not os.path.abspath(os.path.join(target_dir, n)).startswith(target_dir):
            print "unsafe filenames!"
            sys.exit(1)
    tarf.extractall(path=target_dir)

Upvotes: 4

Related Questions