Reputation: 53
I'm looking for the faster way to compress big folder ( around 30 Go ) in python. The compression level is not the priority. I have tried some libraries like tarfile or zipfile. Do you know any other libraries? My script is running on Linux, does Linux command like gzip, bzip2 or xz are faster? Any advices are welcome.
Thanks
Upvotes: 2
Views: 4553
Reputation: 403
I needed to compress in zip format big directories so I compared time between bash commands and python code. Linux command:
zip -r /home/dir_1.zip /home/dir_1
Python command:
shutil.make_archive('/home/dir_1', zip, '/home/dir_1')
since the time to compress the same folder was comparable, I have decided to use the Python function with multiple cores (Pool(4)) in other to compress multiple folders at the same time:
import shutil
from pathlib import Path
from multiprocessing import Pool
from tqdm import tqdm
list_dir_paths = ['/home/dir_1', '/home/dir_2','/home/dir_3','/home/dir_4']
def zip_function(i):
dir_path = Path(list_dir_paths[i])
try:
if dir_path.is_dir():
shutil.make_archive(dir_path, 'zip', dir_path)
print('[ZIPPED]' , dir_path)
else:
print('[NOT DIR]', dir_path)
except Exception as e:
print('[ERROR]', dir_path)
print(e)
if __name__ == '__main__':
with Pool(4) as p:
r = list(
tqdm(
p.imap(zip_function, range(len(list_dir_paths))),
desc='Zipping directories: ',
total=len(list_dir_paths)
)
in a similar way you could use another function to zip in parallel multiple files
Upvotes: 0
Reputation: 1101
Here Is a sample code which ask for folder to be zipped using tkinter lib and Zip it to the Directory named as target Directory.Hope it will Help
from tkinter import *
from tkinter.filedialog import askdirectory
import os
import time
source1 = askdirectory()#Source Directory
print(source1)
source = [str(source1)]
target_dir = '/Users/Dlucidone/Documents/'# Target_Directory
if not os.path.exists(target_dir):
os.mkdir(target_dir)
today = target_dir + os.sep + time.strftime('%Y%m%d')
comment = "zippingDir"#input('Enter a comment --> ')
if len(comment) == 0:
target = today + os.sep + '.zip'
else:
target = today + os.sep + '_' + \
comment.replace(' ', '_') + '.zip'
if not os.path.exists(today):
os.mkdir(today)
print('Successfully created directory', today)
zip_command = "zip -r {0} {1}".format(target,' '.join(source))
print("Zip command is:")
print(zip_command)
print("Running:")
if os.system(zip_command) == 0:
print('Successful backup to', target)
else:
print('Backup FAILED')
Upvotes: 1
Reputation: 1678
I'd suggest you to use the native linux commands and compare the speed or execution time. Use for example the native tar
command.
import time
import os
start = time.time()
os.system("tar -cvf name.tar /path/to/directory")
end = time.time()
print("Elapsed time: %s"%(end - start,))
But mention that tar
does no compression. To reduce the file size you should use gzip
.
import time
import os
start = time.time()
os.system("tar -cvf name.tar /path/to/directory")
os.system("gzip name.tar")
end = time.time()
print("Elapsed time: %s"%(end - start,))
Upvotes: 3