Reputation: 4763
Given the constraints imposed by Deflate (as used in zlib), I want to iterate through all possible code length distributions to find extreme cases which might break my assumptions.
Deflate assigns the bit values for Huffman codes with the lowest-valued bit patterns representing short codes and the highest-valued bit patterns representing long codes. The maximum code length is 15 bits. Discontinuities in the code assignment are not allowed. The maximum number of symbols is around 300.
My problem is that I need to iterate through all the possible code length distributions to check that my assumptions are met.
If I just iterate through different counts of each length then the vast majority of cases will make no sense. You can't have 100 codes of length 2 because they won't fit in the coding space and you can't have all codes be length 15 because they won't fill the coding space.
So there must be a faster way to step through relevant Huffman lengths for N codes visiting only combinations which fill the coding space properly.
I guess something like partitioning the lengths recursively deciding the minimum and maximum number of symbols that can be put either side of the partitions... but I haven't managed to reason it through, yet. I blame this flu I'm getting over.
What I'm actually checking for is that I think the zlib constraints mean that for a symbol to reach 15 bits it must have a lot of leading 1s. If I invert this then I can use floating-point to compact the leading zeroes (previously 1s) into a count (the exponent) rather than a long string of bits. What's the minimum number of leading 1s a code of length n must have, for example?
Upvotes: 0
Views: 105
Reputation: 112502
enough.c in the zlib distribution does exactly this. In that case, to go through all possible sets of code lengths in order to determine the maximum tables sizes needed for inflate's decoding.
There are 18,418,653,064,601,104 valid sets of code lengths for 2 to 286 symbols with a 15-bit length limit. There needs to be some pruning based on the characteristics of what you are exploring about those sets, in order to complete before the heat death of the universe. See the comments in enough.c for how this is done in the specific case of the inflate tables. enough.c covers all of those valid sets in about half a second on my machine.
Upvotes: 0