Convention for Huffman Coding

Question

Is there a convention for generating a Huffman encoding for a certain alphabet? It seems like the resultant encoding depends both on whether you assign '0' to the left child or the right child as well as how you determine which symbol will go to the left tree.

Wikipedia says that:

As a common convention, bit '0' represents following the left child and bit '1' represents following the right child.

So that is an answer to the first half of the variance. However, I couldn't find any convention for the second half. I would assume something like making the node with lower probability go on the left, but several example Huffman trees online don't do this.

For example:

huffman tree

So is there a convention for the assignment of nodes to left and right, or is it up to the implementation?

I apologize if this is a duplicate, but I wasn't able to find an answer.

Mark Adler · Accepted Answer

Yes, in fact there is. Not so much a convention for interoperability, but rather for encoding efficiency. It's called Canonical Huffman, where the codes are assigned in numerical order from the shortest codes to the longest codes, and within a single code length, they are assigned in a lexicographical order on the symbols. This permits transmitting only the length of the code for each symbol, as opposed to the entire tree structure.

Generally what is done is to use the Huffman algorithm tree only to determine the number of bits for each symbol. The tree is then discarded. Bit values are never assigned to the branches. The codes are then built directly from the lengths, using the ordering above.

Convention for Huffman Coding

Answers (2)

Related Questions