Reputation: 9681
I have written the following script that allows me to compress a src
(which can be either a single file or a directory) to target 'dst':
#!/usr/bin/env python2
import tarfile
from ntpath import basename, dirname
from os import path, listdir, makedirs, chdir
import errno
import sys
class Archivator:
@staticmethod
def compress(src='input/test', dst='output'):
# if not path.isfile(src_file):
# print('Expecting absolute path to file (not directory) as "src". If "src" does contain a file, the file does not exist')
# return False
if not path.isdir(dst):
return False
# try:
# makedirs(dst_dir)
# except OSError as err:
# if err.errno != errno.EEXIST:
# return False
filename = basename(src) if path.isdir(src) else src
tar_file = dst + '/' + filename + '.tar.gz'
print(tar_file)
print(src)
with tarfile.open(tar_file, 'w:gz') as tar:
print('Creating archive "' + tar_file + '"')
# chdir(dirname(dst_dir))
recr = path.isdir(src)
if recr:
print('Source is a directory. Will compress all contents using recursion')
tar.add(src, recursive=recr)
return True
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='Uses tar to compress file')
parser.add_argument('-src', '--source', type=str,
help='Absolute path to file (not directory) that will be compressed')
parser.add_argument('-dst', '--destination', type=str, default='output/',
help='Path to output directory. Create archive inside the directory will have the same name as value of "--src" argument')
# Generate configuration
config = parser.parse_args()
Archivator.compress(config.source, config.destination)
For single files I haven't had an issue so far. However while the compression of src
(as a directory) does worked (recursion and all) I have noticed a very annoying issue namely that the complete directory structure is replicated inside the tar.gz
archive.
Example:
Let's say I have the following file structure:
./
|---compression.py (script above)
|
|---updates/
| |
| |---package1/
| |
| |---file1
| |---file2
| |---dir/
| |
| |---file3
|
|---compressed/
with src = 'updates/package1'
and dst = 'compressed'
I am expecting that the resulting archive will
dst
(this works)file1
and file2
About the second point I expect
./
|---compression.py (script above)
|
|---updates/
| |
| |---package1/
| |
| |---file1
| |---file2
| |---dir/
| |
| |---file3
|
|---compressed/
|
|---package1.tar.gz
|
|---file1
|---file2
|---dir/
|
|---file3
but instead I get
./
|---compression.py (script above)
|
|---updates/
| |
| |---package1/
| |
| |---file1
| |---file2
| |---dir/
| |
| |---file3
|
|---compressed/
|
|---package1.tar.gz
|
|---updates/
|
|---package1/
|
|---file1
|---file2
|---dir/
|
|---file3
While the solution might be really trivial I seem to not be able to figure it out. I even tried chdir
-ing inside the src
(if a directory) but it didn't work. Some of my experiments even led to OSError
(due to missing directory where it was expected to be present) and a corrupted archive.
Upvotes: 1
Views: 3458
Reputation: 1
I basically used .replace
to remove the base folder path with arcname
.
with tarfile.open(tar_path, tar_compression) as tar_handle:
for root, dirs, files in os.walk(test_data_path):
for file in files:
tar_handle.add(os.path.join(root, file), arcname=os.path.join(root, file).replace(test_data_path, ""))
Upvotes: 0
Reputation: 20206
First, you are using parameter recursive
wrongly.
According to the official document of tarfile
:
def add(self, name, arcname=None, recursive=True, exclude=None):
"""Add the file `name' to the archive. `name' may be any type of file
(directory, fifo, symbolic link, etc.). If given, `arcname'
specifies an alternative name for the file in the archive.
Directories are added recursively by default. This can be avoided by
setting `recursive' to False. `exclude' is a function that should
return True for each filename to be excluded.
"""
You can use arcname
to specify the alternative name in the archive. And recursive
is used to control if creates directories recursively.
tarfile
can directly add a directory.
Back to your question, you can manually add each files and specify their arcname
. For example, tar.add("updates/package1/file1", "file1")
.
Or you can set arcname
to an empty string. As it will omit the root directory.
Upvotes: 1