Reputation: 1387
i am trying to write a python-script, which should extract a zip file:
Board: Beagle-Bone black ~ 1GHz Arm-Cortex-a8
, debian wheezy
Zipfile: /home/milo/my.zip, ~ 8 MB
>>> from zipfile import ZipFile
>>> zip = ZipFile("/home/milo/my.zip")
>>> zip.extractall(pwd="tst")
other solutions with opening and reading-> writing the zipfile and extracting even particular file have the same effect. extracting take about 3-4 minutes.
Extracting the same file with just using unzip-tool takes less than 2 seconds.
Does anyone know what is wonrg with my code, or even with python zipfile
lib??
Thanks Ajava
Upvotes: 6
Views: 3730
Reputation: 548
Copy from my answer https://stackoverflow.com/a/72513075/10860732
It's quite stupid that Python doesn't implement zip decryption in pure c.
So I make it in cython, which is 17 times faster.
Just get the dezip.pyx and setup.py from this gist.
https://gist.github.com/zylo117/cb2794c84b459eba301df7b82ddbc1ec
And install cython and build a cython library
pip3 install cython
python3 setup.py build_ext --inplace
Then run the original script with two more lines.
import zipfile
# add these two lines
from dezip import _ZipDecrypter_C
setattr(zipfile, '_ZipDecrypter', _ZipDecrypter_C)
z = zipfile.ZipFile('./test.zip', 'r')
z.extractall('/tmp/123', None, b'password')
Upvotes: 1
Reputation: 1458
This seems to be a documented issue with the ZipFile module in Python 2.7. If you look at the documentation for ZipFile, it clearly mentions:
Decryption is extremely slow as it is implemented in native Python rather than C.
If you need faster performance, you can either invoke an an external program (like unzip or 7zip) from your code, or make sure the zip files you are working with are not password protected.
Upvotes: 7