Reputation: 550
my hope for this question (see: bottom) is to lay out as much as I know about the deflate process, and I can receive corrections on areas where I am (perhaps very) misinformed. Hopefully, at the end of it, this question could be a handy resource.
The first two bytes equate to a header for the zlib compression used with the format of (credit)
---CMF--- ---FLG---
0111.1000 1101.0101
CINF -CM- +-||
| |+- FCHECK
| +-- FDICT
+---- FLEVEL
From RFC 1950, right to left:
FCHECK (1.0101) - validates that CMF & FLG as a 16 bit unsigned int is a multiple of 31
FDICT (0) - if set, indicates preset DICT following immediately after FLG
FLEVEL (11) - compression "intensity" [0-3]
CM (1000) - for compression method, where CM = 8 == "deflate" compression method
CINF (0111) - indicates the size of the sliding window used, where CINF = 7 == 32K sliding window
The next three bits in the NEW BYTE equate to a header for the Huffman encoded block:
---CMF--- ---FLG--- NEW BYTE
0111.1000 1101.0101 11101100
|-|
| +- BFINAL
+--- BTYPE
From RFC 1951 right to left:
BFINAL (0) - is set (1) if this is the last block of data
BTYPE (10) - Huffman encoding : (00)none; (01)Fixed Huffman codes; (10) dynamic codes; (11) invalid
From here I will work off the assumption of BTYPE = (10)
The following values immediately proceed:
NEW BYTE NXT BYTE
(11101)100 -> 101)(11101) -> 0111111(1
|-|
| +- BFINAL
+--- BTYPE
HLIT
(11101) - 5-bit number of length/literal codes, 257 added (257-286)
HDIST
(11101) - 5-bit number of distance codes, 1 added (1-32)
HCLEN
(1111) - 4-bit number of code-length codes, 4 added (4-19)
Immediately following this is HCLEN
(don't forget +4) 3-bit fields, where the values are assigned to this sequence, in order:
16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15
Since HCLEN = 19, the whole sequence is used
A code length of 0
in that sequence means the corresponding symbol is not used.
As a graphical example, after reading the 19x3 bits we have six extra bits(extra bits in brackets):
NXT BYTE 00000000 00000000 00000000 00000000 00000000 00000000 [000000](00
Do the last bits in brackets above, get thrown away?
Upvotes: 5
Views: 1021
Reputation: 112259
No. The only times in a deflate stream that you skip bits to go to a byte boundary are for a stored block (00), or when the end code in the last block is read. After the bits for the code length code lengths, you continue with the subsequent bits to use the generated Huffman code to read the code lengths.
Upvotes: 7