Btc Sources
Btc Sources

Reputation: 2041

Fast zip decryption in python

I've a program which process a zip file using zipfile. It works with an iterator, since the uncompressed file is bigger than 2GB and it can become a memory problem.

with zipfile.Zipfile(BytesIO(my_file)) as myzip:
    for file_inside in myzip.namelist():
        with myzip.open(file_inside) as file:
            # Process here
            # for loop ....

Then I noticed that this process was being extremely slow to process my file. And I can understand that it may take some time, but at least it should use my machine resources: lets say the python process should use at 100% the core where it lives.

Since it doesn't, I started researching the possible root causes. I'm not an expert in compression matters, so first considered basic things:

This made me to think that the bottleneck could be in the most invisible parameters: RAM bandwidth. However I have no idea how could I measure this.

Then on the software side, I found on the zipfile docs:

Decryption is extremely slow as it is implemented in native Python rather than C.

I guess that if it's using native Python, it's not even using OpenGL acceleration so another point for slowliness. I'm also curious about how this method works, again because of the low CPU usage.

So my question is of course, how could I work in a similar way (not having the full uncompress file in RAM), but uncompressing in a faster way in Python? Is there another library or maybe another approach to overcome this slowliness?

Upvotes: 2

Views: 1876

Answers (3)

Carl Cheung
Carl Cheung

Reputation: 548

It's quite stupid that Python doesn't implement zip decryption in pure c.

So I make it in cython, which is 17 times faster.

Just get the dezip.pyx and setup.py from this gist.

https://gist.github.com/zylo117/cb2794c84b459eba301df7b82ddbc1ec

And install cython and build a cython library

pip3 install cython
python3 setup.py build_ext --inplace

Then run the original script with two more lines.

import zipfile

# add these two lines
from dezip import _ZipDecrypter_C
setattr(zipfile, '_ZipDecrypter', _ZipDecrypter_C)

z = zipfile.ZipFile('./test.zip', 'r')
z.extractall('/tmp/123', None, b'password')

Upvotes: 2

Danizavtz
Danizavtz

Reputation: 3270

There is this lib for python to handle zipping files without memory hassle.

Quoted from the docs:

Buzon - ZipFly

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without memory inflation.

Never used but can help.

Upvotes: 1

Hari5000
Hari5000

Reputation: 401

I've done some research and found the following:

You could "pip install czipfile", more information at https://pypi.org/project/czipfile/

Another solution is to use "Cython", a variant of python -https://www.reddit.com/r/Python/comments/cksvp/whats_a_python_zip_library_with_fast_decryption/

Or you could outsource to 7-Zip, as explained here: Faster alternative to Python's zipfile module?

Upvotes: 1

Related Questions