Identifying the positions of characters in Huffman Coding Algorithm

Question

I am reading Huffman Coding Algorithm to encode a string. I can see that the frequency of the characters is taken into account to make a tree.

Here is the frequency table :

a   b   d   e   f   h   i   k   n   o   r   s   t   u   v 
5   1   3   7   3   1   1   1   4   1   5   1   2   1   1   9

*space has frequency 9

I can see there is a tree made with this. But I am not able to derive a rule how to place elements in tree.

The book says that all the characters with higher frequency should be near the root. But if more than two characters are of same frequency then they have to be on different sides of the root.

The question is, how do we decide the position?

IN my book a has the code 010, r has 011 and e has the code 100.

Can anyone please help?

Sufian Latif · Accepted Answer

Have you tried Wikipedia? There's a nice demonstration on Huffman coding. The algorithm is simple enough: you need a priority queue.

The algorithm is somewhat like this:

1. Create tree nodes with each character and their frequencies
2. Put all the letters and their frequencies in a priority queue Q
3. Do until Q contains only one element:
    3a. Pick two lowest-frequency items a, b
    3b. Create a tree node z with frequency(z) = frequency(a) + frequency(b)
    3c. Add a and b as left and right children of z
    3d. Put z in Q
4. Pick up the only element from Q. This would be the root of the tree.
5. Assign binary codes to each leaf node according to their root-to-leaf path.

The priority queue should be designed as a min-priority queue, i.e. the node with lowest frequency should come out first. For handling equal-frequency items, use some other criteria (e.g. alphabetical order) as a tie-breaker. Be consistent with the tie-breaking criteria for both encoding and decoding.

Identifying the positions of characters in Huffman Coding Algorithm

Answers (2)

Related Questions