4rk
4rk

Reputation: 375

Compress large python objects

I am trying to compress a huge python object ~15G, and save it on the disk. Due to requrement constraints I need to compress this file as much as possible. I am presently using zlib.compress(9). My main concern is the memory taken exceeds what I have available on the system 32g during compression, and going forward the size of the object is expected to increase. Is there a more efficient/better way to achieve this. Thanks.

Update: Also to note the object that I want to save is a sparse numpy matrix, and that I am serializing the data before compressing, which also increases the memory consumption. Since I do not need the python object after it is serialized, would gc.collect() help?

Upvotes: 1

Views: 1399

Answers (2)

Simon
Simon

Reputation: 10841

The memLevel parameter of deflateInit2 () specifies how much memory should be allocated for the internal compression state. The default is 8, the maximum is 9 and the minimum is 1 (see the zlib manual). If you've already tried that or it doesn't help you enough, it might be necessary to look at another compression algorithm or library instead.

Upvotes: 0

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798456

Incremental (de)compression should be done with zlib.{de,}compressobj() so that memory consumption can be minimized. Additionally, higher compression ratios can be attained for most data by using bz2 instead.

Upvotes: 5

Related Questions