Danny Rancher
Danny Rancher

Reputation: 2005

Huffman Coding - Combatting if frequencies of two letters are equal, different codeword generations are possible

My compressor uses the frequency table to construct a Huffman tree and then an encoding and saves the frequency table and the encoding to the file.

The decompressor reads the frequency table from the file, reconstructs the Huffman tree and then decodes the encoding saved in the file.

The problem is that when two frequencies are the same, the compressor and decompressor are creating two different Huffman trees, generating different codewords and although valid the decoding breaks because they are different.

What can I do to combat this?

Regards.

Note: I'm writing this in Java.

Upvotes: 0

Views: 1628

Answers (2)

flanglet
flanglet

Reputation: 574

You should check Canonical Huffman codes: http://en.wikipedia.org/wiki/Canonical_Huffman_code.

First it addresses your issue. Second, it allows you to transmit only code size differences instead of a code table or (usually worse) frequency table. If you sort your symbols during the creation of the Huffman tree, make sure to use a stable sort algorithm.

There is a Java implementation example available here: https://code.google.com/p/kanzi/source/browse/java/src/kanzi/entropy/HuffmanEncoder.java

Upvotes: 1

michaeltang
michaeltang

Reputation: 2898

the decompressor is not suppose to read the frequency table from the file, reconstructs the Huffman tree and then decodes the encoding saved in the file. The compressor should save the word encode table which is "stack" -- 000 "flow" --- 000,then the decompressor just read the encode table to get the word for code.

Upvotes: 1

Related Questions