Reputation: 2005
My compressor uses the frequency table to construct a Huffman tree and then an encoding and saves the frequency table and the encoding to the file.
The decompressor reads the frequency table from the file, reconstructs the Huffman tree and then decodes the encoding saved in the file.
The problem is that when two frequencies are the same, the compressor and decompressor are creating two different Huffman trees, generating different codewords and although valid the decoding breaks because they are different.
What can I do to combat this?
Regards.
Note: I'm writing this in Java.
Upvotes: 0
Views: 1628
Reputation: 574
You should check Canonical Huffman codes: http://en.wikipedia.org/wiki/Canonical_Huffman_code.
First it addresses your issue. Second, it allows you to transmit only code size differences instead of a code table or (usually worse) frequency table. If you sort your symbols during the creation of the Huffman tree, make sure to use a stable sort algorithm.
There is a Java implementation example available here: https://code.google.com/p/kanzi/source/browse/java/src/kanzi/entropy/HuffmanEncoder.java
Upvotes: 1
Reputation: 2898
the decompressor is not suppose to read the frequency table from the file, reconstructs the Huffman tree and then decodes the encoding saved in the file. The compressor should save the word encode table which is "stack" -- 000 "flow" --- 000,then the decompressor just read the encode table to get the word for code.
Upvotes: 1