pandas compression level and memory usage

Question

Greetings to the community

I have a simple question in which i can probably answer myself but i really want the opinion of others.
We are developing a model (in python) that uses a combination of feather and hdf5 files to store results. We use pandas.
For now, i chose to use uncompressed files and the blosc:snappy algorithm because we are more interested in keeping the memory usage low during I/O operations rather than disk space. In theory, higher compression means smaller files in the expense of reading/writing times and memory.
pandas offer numerous compression algorithms. so is my assumption, that uncompressed files are better for memory usage, correct for all these algorithms?
If i am only interested in keeping memory usage low during reading/writing, is there really a point to use compression?
I can't find a single comparison chart for memory usage and compression level.
Thanks

Mark Adler · Accepted Answer

Only you can know the answer to your question, since it depends on how often you are compressing and decompressing the data, compared to other activities in your application, and how much reduced memory usage improves speed due to the avoidance of thrashing. No generic benchmarks will give you insight into your problem.

I recommend experimenting with lz4 for your application.

pandas compression level and memory usage

Answers (1)

Related Questions