Christophe Biguereau
Christophe Biguereau

Reputation: 53

Faster way to compress folder in python

I'm looking for the faster way to compress big folder ( around 30 Go ) in python. The compression level is not the priority. I have tried some libraries like tarfile or zipfile. Do you know any other libraries? My script is running on Linux, does Linux command like gzip, bzip2 or xz are faster? Any advices are welcome.

Thanks

Upvotes: 2

Views: 4553

Answers (3)

miriam mazzeo
miriam mazzeo

Reputation: 403

I needed to compress in zip format big directories so I compared time between bash commands and python code. Linux command:

zip -r /home/dir_1.zip /home/dir_1

Python command:

shutil.make_archive('/home/dir_1', zip, '/home/dir_1') 

since the time to compress the same folder was comparable, I have decided to use the Python function with multiple cores (Pool(4)) in other to compress multiple folders at the same time:

import shutil
from pathlib import Path
from multiprocessing import Pool
from tqdm import tqdm

list_dir_paths = ['/home/dir_1', '/home/dir_2','/home/dir_3','/home/dir_4']

def zip_function(i):
    dir_path = Path(list_dir_paths[i])
    try:
        if dir_path.is_dir():
            shutil.make_archive(dir_path, 'zip', dir_path)
            print('[ZIPPED]' , dir_path)
        else:
            print('[NOT DIR]', dir_path)
    except Exception as e:
        print('[ERROR]', dir_path)
        print(e)


if __name__ == '__main__':
    with Pool(4) as p:
        r = list(
            tqdm(
                p.imap(zip_function, range(len(list_dir_paths))),
                desc='Zipping directories: ',
                total=len(list_dir_paths)
            )

in a similar way you could use another function to zip in parallel multiple files

Upvotes: 0

Dlucidone
Dlucidone

Reputation: 1101

Here Is a sample code which ask for folder to be zipped using tkinter lib and Zip it to the Directory named as target Directory.Hope it will Help

from tkinter import *
from tkinter.filedialog import askdirectory
import os
import time

source1 = askdirectory()#Source Directory
print(source1)
source = [str(source1)]
target_dir = '/Users/Dlucidone/Documents/'# Target_Directory

if not os.path.exists(target_dir):
  os.mkdir(target_dir)

today = target_dir + os.sep + time.strftime('%Y%m%d')
comment = "zippingDir"#input('Enter a comment --> ')
if len(comment) == 0:

  target = today + os.sep  + '.zip'
else:
  target = today + os.sep  + '_' + \
     comment.replace(' ', '_') + '.zip'
if not os.path.exists(today):
  os.mkdir(today)
print('Successfully created directory', today)

zip_command = "zip -r {0} {1}".format(target,' '.join(source))

print("Zip command is:")
print(zip_command)
print("Running:")
if os.system(zip_command) == 0:
  print('Successful backup to', target)
else:
  print('Backup FAILED')

Upvotes: 1

wewa
wewa

Reputation: 1678

I'd suggest you to use the native linux commands and compare the speed or execution time. Use for example the native tar command.

import time
import os

start = time.time()

os.system("tar -cvf name.tar /path/to/directory")

end = time.time()
print("Elapsed time: %s"%(end - start,))

But mention that tar does no compression. To reduce the file size you should use gzip.

import time
import os

start = time.time()

os.system("tar -cvf name.tar /path/to/directory")
os.system("gzip name.tar")

end = time.time()
print("Elapsed time: %s"%(end - start,))

Upvotes: 3

Related Questions