Reputation: 1529
I have compressed a binary file using Huffman encoding. Now I am trying to find the compression efficiency.
In my binary file I have symbols (buch of 0 & 1) and frequency (repetition of symbols). Suppose I have:
Currently every symbol is encoded in UInt64, so size of file would be (173+50+48+45)*8=2528 bytes if my way of calculating the size is correct. Please correct me if I am wrong. On debugging I get 2536, 8 more I don't know why?
After compression I got encoding like this
Could some one please tell me how to get Huffman compression of this binary file using these information? I tried searching on Google but there is no sample of binary file they have some frequency of float type which I am not able to understand how to relate them with my Binary file.
Upvotes: 1
Views: 2972
Reputation: 1
Based on provided frequencies the tree is wrong. It must be 0 10 110 111
It always ends on with all positive bits. Solutions, which are different than Huffman trees, may work but do not provide best compression.
Upvotes: 0
Reputation: 399793
You need to simply compute how many total bits you end up with:
173 * 1 + 50 * 2 + 48 * 3 + 45 * 3
That comes to 552 bits. Converting to whole bytes gives us 69 bytes. So that's a compression to 69/2528 or about 3% of the original, assuming the de-compressor can know the dictionary and so on. Also assuming your input symbols (0 to 3) are 64-bit values, for some reason.
Upvotes: 1