Phere Salad
Phere Salad

Reputation: 1

Quick method to merge Huffman Coding tree with repetitive entries

Say I have many repetitive entries that are to be merged into a Huffman Coding tree. Simply merging them would cost n*logn but I want it to be faster. Say I have 100000 entries of the same frequency 1, then the brute-force Huffman tree building will merge 50000 times at first, which is very slow.

My instructor told me that, we can directly consider it to be (2,50000), (4, 25000) and so on, which is much faster. Yet I have no idea how to implement it. (An explanation with Cpp code would help me a lot ;)

Upvotes: 0

Views: 40

Answers (1)

Mark Adler
Mark Adler

Reputation: 112502

If all of your "entries", which I take to mean symbols, have the same frequency, then you don't need to build a Huffman tree at all.

Given n symbols, find the largest power of two, call it k, such that k ≤ n. Then code the first 2k - n symbols using lg k bits, and the remaining 2(n - k) symbols using 1 + lg k bits. You make this a prefix code by appending one bit to the end for the second set of symbols. That gives an optimal prefix code.

So if n = 5, we have k = 4 and lg k = 2. The first three symbols are coded using two bits:

00
01
10

and the remaining two symbols are coded with three bits, adding a bit on the right after incrementing the previous two-bit code:

110
111

For n = 11:

000
001
010
011
100
1010
1011
1100
1101
1110
1111

Upvotes: 0

Related Questions