Reputation: 93
I am using HDF5 file system in my desktop application. I have used GZIP level 5 compression with all the datasets inside the file.
But still when I am zipping the HDF5 file using 7zip, the file size is getting even smaller by around half to one third!!!
The process I am following is:
How is it possible?
Where is the scope of more compression?
How to generate an even smaller HDF5 file? Any suggestions about the using property(H5P).
I thought that 7zip maybe ruthlessly compressing my file using GZIP level 9 but I tried using GZIP level 9 in my HDF5 file. New file size is still the half of the original.
Upvotes: 2
Views: 2775
Reputation: 140
You are applying compression to only the dataset elements in the HDF5 file. Other components of the HDF5 file (internal metadata and objects such as groups) aren't compressed. So, when you compress the entire file, those other components compress, and the already compressed dataset elements could compress some more also.
Upvotes: 3
Reputation: 112374
gzip has a maximum compression ratio of about 1000:1. If the data is more compressible than that, then you can compress it a second time to get more compression (the second time could be gzip again). You can do a simple experiment with a file consisting of only zeros:
% dd ibs=1 count=1000000 < /dev/zero > zeros
% wc -c zeros
1000000
% gzip < zeros | wc -c
1003
% gzip < zeros | gzip | wc -c
64
So what was the compression ratio of your first compression?
Upvotes: 3