Reputation: 55
I would like to filter subdirectories (skip them) while creating tar(gz) file with tarfile (python 3.4).
Files on disk:
Tried to compress /home/myuser/temp/test1/
by tarfile.add()
.
I use with- and without-path modes. With full path it's OK, but with short path I have this problem:
directory exclusion doesn't work because tarfile.add() passes the arcname
parameter to filter method - not name
parameter!
archive.add(entry, arcname=os.path.basename(entry), filter=self.filter_general)
Example:
file: /home/myuser/temp/test1/thing/bar.jpg
-> arcname = test1/thing/bar.jpg
So because of /home/myuser/temp/test1/thing
element in exclude_dir_fullpath
, the filter method should exclude this file, but it can not because filter method gets test1/thing/bar.jpg
.
How could I access tarfile.add()'s 'name' parameter in filter method?
def filter_general(item):
exclude_dir_fullpath = ['/home/myuser/temp/test1/thing', '/home/myuser/temp/test1/lemon']
if any(dirname in item.name for dirname in exclude_dir_fullpath):
print("Exclude fullpath dir matched at: %s" % item.name) # DEBUG
return None
return item
def compress_tar():
filepath = '/tmp/test.tar.gz'
include_dir = '/home/myuser/temp/test1/'
archive = tarfile.open(name=filepath, mode="w:gz")
archive.add(include_dir, arcname=os.path.basename(include_dir), filter=filter_general)
compress_tar()
Upvotes: 0
Views: 476
Reputation: 140178
You want to create a general/re-useable function to filter out files given their absolute path name. I understand that filtering on the archive name is not enough since sometimes it would be OK to include a file or not depending on where it is originated.
First, add a parameter to your filter function
def filter_general(item,root_dir):
full_path = os.path.join(root_dir,item.name)
Then, replace your "add to archive" code line by:
archive.add(include_dir, arcname=os.path.basename(include_dir), filter=lambda x: filter_general(x,os.path.dirname(include_dir)))
the filter function has been replaced by a lambda
which passes the directory name of the include directory (else, root dir would be repeated)
Now your filter function knows the root dir and you can filter by absolute path, allowing you to reuse your filter function in several locations in your code.
Upvotes: 0