Reputation: 165
Greetings to the community
I have a simple question in which i can probably answer myself but i really want the opinion of others.
We are developing a model (in python) that uses a combination of feather and hdf5 files to store results. We use pandas.
For now, i chose to use uncompressed files and the blosc:snappy algorithm because we are more interested in keeping the memory usage low during I/O operations rather than disk space.
In theory, higher compression means smaller files in the expense of reading/writing times and memory.
pandas offer numerous compression algorithms. so is my assumption, that uncompressed files are better for memory usage, correct for all these algorithms?
If i am only interested in keeping memory usage low during reading/writing, is there really a point to use compression?
I can't find a single comparison chart for memory usage and compression level.
Thanks
Upvotes: 0
Views: 663
Reputation: 112404
Only you can know the answer to your question, since it depends on how often you are compressing and decompressing the data, compared to other activities in your application, and how much reduced memory usage improves speed due to the avoidance of thrashing. No generic benchmarks will give you insight into your problem.
I recommend experimenting with lz4 for your application.
Upvotes: 1