Reputation: 29
I am trying to construct an explicit example of a dynamic block. Please let me know if this is wrong.
Considering this example of lit/len alphabet:
A(0), B(0), C(1), D(0), E(2), F(2), G(2), H(2) and the rest of symbols having zero code lengths.
The sequence(SQ) of code lengths would be 0...,0,0,1,0,2,2,2,2,...0.
Then we have to compress it further with run-length encoding. So we have to calculate number of repetitions and either use flag 16 to copy the previous code length, or 17 or 18 to repeat code length 0 (using extra bits).
My problem is this. After sending the header information and the sequence of code-length code lengths in the right order 16,17,18,..., the next sequence of information would be something like:
18, some extra bits value,1,0,2,16, some extra bits value 0,18, some extra bits value. (Probably there would be another 18 flag since the maximum repeat count is 138.)
Then we have the same thing with the distance alphabet and finally the inputs data encoded with the canonical Huffman, and extra bits if necessary.
Is it necessary to send the code lengths of 0? If so, why?
If yes, why is it necessary to have hclit and hcdist and not only hclen, knowing that the lengths of the sequences are 286 for lit/len and 30 for distances?
If not what would be the real solution?
Another problem:
In this case we have code length 2 with repetitions (3) extra bits value of 0.
If yes I can't understand how: flag 18 has next a maximum possible extra bits value of 127 (1111111) representing 138 repetitions and it couldn't be included into the alphabet symbols of 0-18.
P.S When I say extra bits in this case I mean the factor that it is used to know how many repetitions of the previous length are used.
More precisely 0 - 15 we have 0 bit factor repetition, for 16,17,18 we have 2,3,7 bits repetitions factor. The value of those bits is what I mean with extra bits value.
I think I'm missing something about what Huffman codes are generated by the Huffman code-length alphabet.
Upvotes: 0
Views: 165
Reputation: 112502
First off, your example code is invalid, since it oversubscribes the available bit patterns. 2,2,2,2 would use all of the bit patterns, since there are only four possible two-bit patterns. You can't have one more code, of any length. Possible valid codes for five symbols is 1,2,3,4,4, or 2,2,2,3,3.
To answer your questions in order:
You need to send the leading zeros, but you do not need to send the trailing zeros. The HLIT and HDIST counts determine how many lengths of each code type are in the header, where any after those are taken to be zero. You need to send the zeros, since the code lengths are associated with the corresponding symbol by their position in the list.
It saves space in header to have the HLIT and HDIST counts, so that you don't need to provide lengths for all 316 codes in every header.
I don't understand this question, but I guess it doesn't apply.
If I understand your question, extra bits have nothing to do with the descriptions of the Huffman codes in the headers. The extra bits are implied by the symbol. In any case, a repeated length is encoded with code 16, not code 18. So the four twos would be encoded as 2, 16(0), where the (0) represents two extra bits that are zeros.
Upvotes: 0