devoured elysium
devoured elysium

Reputation: 105057

Corner case in Huffman Algorithm

I am trying to solve by hand two different scenarios of compression via Huffman algorithm, In each one of them, we start with an ordered queue that contains tuples (symbol, frequency) as its items, and we'll try to form a dictionary:

step 0:
[c:3] [b:4] [a:5]

step 1:
[a:5]    [7]
     [c:3] [b:4]

step 2:
     [12]
[a:5]    [7]
     [c:3] [b:4]

if we consider the left to be 0 and the right to be 1, then we have as dictionary:

a -> 0
b -> 11
c -> 10

Until now, everything looks right. But let's assume our initial queue was something like the following, instead:

step 0:
[c:1] [b:2] [a:4]

step 1:
    [3]      [a:4]
[c:1] [b:2]

step 2:
         [7]
    [3]      [a:4]
[c:1] [b:2]

that yields the following dictionary:

a -> 1
b -> 01
c -> 00

which doesn't seem right (both a and b are equal).

What am I doing wrong? I could just do an if to see in the root of the tree which one of the branches is actually a leaf, selecting that direction to be the 1's direction, but this doesn't seem clean to me. I guess I must be missing something?

Upvotes: 0

Views: 238

Answers (2)

jjrv
jjrv

Reputation: 4345

The bit sequences are not equal. If you have a bit string like:

01100

Then it can only be decompressed as "bac". You have to store the compression result in a way that preserves leading zeroes in the sequence, so for example the above could be padded to 01100000 or 00001100 to fill a byte of output and then stored along with the length 5. Of course the issue is only with the first or last byte of output, depending on which side you choose to pad.

Upvotes: 2

nhahtdh
nhahtdh

Reputation: 56809

The point is that the sequence of bits in the dictionary should not cause ambiguity when decoding. The value of the sequence doesn't matter.

Ambiguity is resolved in Huffman coding by the condition that only the leaf in the coding tree can hold the character to be encoded. The tree above follows this rule, so there is no problem with the resulting encoding.

Upvotes: 1

Related Questions