Balint
Balint

Reputation: 55

Python: how could I access tarfile.add()'s 'name' parameter in add()'s filter method?

I would like to filter subdirectories (skip them) while creating tar(gz) file with tarfile (python 3.4).

Files on disk:

Tried to compress /home/myuser/temp/test1/ by tarfile.add().

I use with- and without-path modes. With full path it's OK, but with short path I have this problem: directory exclusion doesn't work because tarfile.add() passes the arcname parameter to filter method - not name parameter!

archive.add(entry, arcname=os.path.basename(entry), filter=self.filter_general)

Example:

file: /home/myuser/temp/test1/thing/bar.jpg -> arcname = test1/thing/bar.jpg

So because of /home/myuser/temp/test1/thing element in exclude_dir_fullpath, the filter method should exclude this file, but it can not because filter method gets test1/thing/bar.jpg.

How could I access tarfile.add()'s 'name' parameter in filter method?

def filter_general(item):
    exclude_dir_fullpath = ['/home/myuser/temp/test1/thing', '/home/myuser/temp/test1/lemon']
    if any(dirname in item.name for dirname in exclude_dir_fullpath):
        print("Exclude fullpath dir matched at: %s" % item.name)  # DEBUG
        return None
    return item


def compress_tar():
    filepath = '/tmp/test.tar.gz'
    include_dir = '/home/myuser/temp/test1/'
    archive = tarfile.open(name=filepath, mode="w:gz")
    archive.add(include_dir, arcname=os.path.basename(include_dir), filter=filter_general)

compress_tar()

Upvotes: 0

Views: 476

Answers (1)

Jean-François Fabre
Jean-François Fabre

Reputation: 140178

You want to create a general/re-useable function to filter out files given their absolute path name. I understand that filtering on the archive name is not enough since sometimes it would be OK to include a file or not depending on where it is originated.

First, add a parameter to your filter function

def filter_general(item,root_dir):
    full_path = os.path.join(root_dir,item.name)

Then, replace your "add to archive" code line by:

archive.add(include_dir, arcname=os.path.basename(include_dir), filter=lambda x: filter_general(x,os.path.dirname(include_dir)))

the filter function has been replaced by a lambda which passes the directory name of the include directory (else, root dir would be repeated)

Now your filter function knows the root dir and you can filter by absolute path, allowing you to reuse your filter function in several locations in your code.

Upvotes: 0

Related Questions