Reputation: 67
The explanation below is from Wikipedia about length-limited Huffman codes using package-merge. I can not understand, I have some questions about this.
Let L be the maximum length any code word is permitted to have. Let p1, …, pn be the frequencies of the symbols of the alphabet to be encoded. We first sort the symbols so that pi ≤ pi+1. Create L coins for each symbol, of denominations 2−1, …, 2−L, each of numismatic value pi. Use the package-merge algorithm to select the set of coins of minimum numismatic value whose denominations total n − 1. Let hi be the number of coins of numismatic value pi selected. The optimal length-limited Huffman code will encode symbol i with a bit string of length hi."
Upvotes: 2
Views: 2545
Reputation: 25532
Yes, it's just a way to build a Huffman code that has limit on codeword length.
A Huffman code encodes every letter of the alphabet with a binary string that can be uniquely determined. For example, if your alphabet is {A, B, C} and A is more common than B and C, the following encoding can work well:
A - 0
B - 10
C - 11
An encoded string such as 0010110 can be uniquely encoded, because the length of the encoding bit string is defined already by the Huffman code (here --- every string that begins with 0 has length 1, and every string that beings with 1 has length 2). So the string decodes to 0|0|10|11|0 = AABCA.
Now the "problem" in constructing Huffman codes is how to select the encoding bit strings so that the resulting encodings are on the average as short as possible. In your problem there is an additional constraint that the length of any code word cannot exceed L. The general idea is to use shorter strings to encode more common symbols.
The details of the package-merge algorithm aren't important, as the key is that you use an algorithm to select "set of coins of minimum numismatic value whose denominations total n - 1". If you have coins with denominations 2−1, 2−2, ..., and you want to build total value of 100 cents out of them, you can think of this process as having first a coin of value 100, and then splitting it to two 50 cents (2−1), and then continuing to split your coins in half as long as you want, e.g. 50 cents + 25 cents + 12.5 cents + 12.5 cents. This corresponds to the construction of a binary tree; whenever you split a coin, you create an internal node in a binary tree and add two leaves on one level deeper.
Now the idea of minimizing the numismatic value is that those "coins" that are linked to "higher frequency" symbols are more expensive to use, so you want to split those coins less, corresponding to having shorter codes.
The details are left as an exercise to the reader.
Upvotes: 1
Reputation: 12592
Maybe it is just an alternative way to build a huffman code. Have you looked at http://cbloomrants.blogspot.com/2010/07/07-02-10-length-limitted-huffman-codes.html? IMO the package-merge algorithm isn't building a huffman tree. You want to look for the Golomb Code.
Upvotes: 1