Reputation: 375
I am trying to compress a huge python object ~15G, and save it on the disk. Due to requrement constraints I need to compress this file as much as possible. I am presently using zlib.compress(9). My main concern is the memory taken exceeds what I have available on the system 32g during compression, and going forward the size of the object is expected to increase. Is there a more efficient/better way to achieve this. Thanks.
Update: Also to note the object that I want to save is a sparse numpy matrix, and that I am serializing the data before compressing, which also increases the memory consumption. Since I do not need the python object after it is serialized, would gc.collect() help?
Upvotes: 1
Views: 1399
Reputation: 10841
The memLevel parameter of deflateInit2 ()
specifies how much memory should be allocated for the internal compression state. The default is 8
, the maximum is 9
and the minimum is 1
(see the zlib manual). If you've already tried that or it doesn't help you enough, it might be necessary to look at another compression algorithm or library instead.
Upvotes: 0
Reputation: 798456
Incremental (de)compression should be done with zlib.{de,}compressobj()
so that memory consumption can be minimized. Additionally, higher compression ratios can be attained for most data by using bz2
instead.
Upvotes: 5