Reputation: 3213
I got a working function for compressing multiple files into one zip file
targetzipfile = os.path.normpath(targetfolder) + '.zip'
zipf = zipfile.ZipFile(targetzipfile,'w', zipfile.ZIP_DEFLATED, allowZip64=True)
for root, dirs, files in os.walk(targetfolder):
for f in files:
#use relative path zipfile.write(absfilename, archivename), the archive name is the name to be shown in the zip file
print "compressing: %s" % os.path.join(root,f)
zipf.write(os.path.join(root,f),os.path.relpath(os.path.join(root,f), os.path.dirname(os.path.normpath(targetfolder)))) #Note here maybe a problem, root/f must
zipf.close()
But it is very slow to run since I got lots of file. So I'm looking for a way to optimize this loop with multi-processing capability in python, like OpenMP.
Thanks for any advice.
Upvotes: 2
Views: 8229
Reputation: 9791
I doubt that multiprocessing would help here.
The zipfile
module in the Python stdlib is not thread safe!!!
Thus, how shall we optimize your code?
ALWAYS profile before and while performing optimizations.
Because I don't know your file structures. I take the python source code for example.
$ time python singleprocess.py
python singleprocess.py 2.31s user 0.22s system 100% cpu 2.525 total
Then, let's try the zip command shipped with Ubuntu.(info-zip).
Your can specify the compression level for the zip command. -1 indicates the fastest compression speed (less compression) and -9 indicates the slowest compression speed. The default compression level is -6.
$ time zip python.zip Python-2.7.6 -r -q
zip python.zip Python-2.7.6 -r -q 2.02s user 0.11s system 99% cpu 2.130 total
$ time zip python.zip Python-2.7.6 -r -q -1
zip python.zip Python-2.7.6 -r -q -1 1.00s user 0.11s system 99% cpu 1.114 total
$ time zip python.zip Python-2.7.6 -r -q -9
zip python.zip Python-2.7.6 -r -q -9 4.92s user 0.11s system 99% cpu 5.034 total
You see, the performance of python's zlib module is very competitive. But there are some professional zip tools could give you more control on the compression strategy.
You can call those external command with the subprocess modules in python.
Besides, when you use the python code above to zip a directory, You'll lose the metadata (permission bits, last access time, last modification time ...)of the directory and its subdirectories.
Upvotes: 2
Reputation: 750
import multiprocessing
import time
data = (
List_of_files
)
targetfolder = "targetFolder"
def mp_worker((inputs, targetfolder)):
print "compressing: %s" % os.path.join(root,f)
zipf.write(os.path.join(root,inputs),os.path.relpath(os.path.join(root,inputs), os.path.dirname(os.path.normpath(targetfolder)))) #Note here maybe a problem, root/f must
zipf.close()
print " Process %s\tDONE" % inputs
def mp_handler():
p = multiprocessing.Pool(2)
p.map(mp_worker, data)
if __name__ == '__main__':
mp_handler()
You can refer - Python Module of the Week
Upvotes: -1