Trés DuBiel
Trés DuBiel

Reputation: 550

Deflate compression spec clarification(s)

my hope for this question (see: bottom) is to lay out as much as I know about the deflate process, and I can receive corrections on areas where I am (perhaps very) misinformed. Hopefully, at the end of it, this question could be a handy resource.

Zlib header

The first two bytes equate to a header for the zlib compression used with the format of (credit)

---CMF---  ---FLG---
0111.1000  1101.0101
CINF -CM-  +-||
           | |+- FCHECK
           | +-- FDICT
           +---- FLEVEL

From RFC 1950, right to left:

  1. FCHECK (1.0101) - validates that CMF & FLG as a 16 bit unsigned int is a multiple of 31

  2. FDICT (0) - if set, indicates preset DICT following immediately after FLG

  3. FLEVEL (11) - compression "intensity" [0-3]

  4. CM (1000) - for compression method, where CM = 8 == "deflate" compression method

  5. CINF (0111) - indicates the size of the sliding window used, where CINF = 7 == 32K sliding window

Data block header

The next three bits in the NEW BYTE equate to a header for the Huffman encoded block:

---CMF---  ---FLG---  NEW BYTE
0111.1000  1101.0101  11101100
                           |-|
                           | +- BFINAL
                           +--- BTYPE

From RFC 1951 right to left:

  1. BFINAL (0) - is set (1) if this is the last block of data

  2. BTYPE (10) - Huffman encoding : (00)none; (01)Fixed Huffman codes; (10) dynamic codes; (11) invalid

The Huffman Codes

From here I will work off the assumption of BTYPE = (10)

The following values immediately proceed:

NEW BYTE                NXT BYTE                  
(11101)100       ->     101)(11101)   ->   0111111(1
       |-|
       | +- BFINAL
       +--- BTYPE
  1. HLIT (11101) - 5-bit number of length/literal codes, 257 added (257-286)

  2. HDIST (11101) - 5-bit number of distance codes, 1 added (1-32)

  3. HCLEN (1111) - 4-bit number of code-length codes, 4 added (4-19)

Immediately following this is HCLEN(don't forget +4) 3-bit fields, where the values are assigned to this sequence, in order:

16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15

Since HCLEN = 19, the whole sequence is used

A code length of 0 in that sequence means the corresponding symbol is not used.

As a graphical example, after reading the 19x3 bits we have six extra bits(extra bits in brackets):

NXT BYTE 00000000 00000000 00000000 00000000 00000000 00000000 [000000](00

My Question

Do the last bits in brackets above, get thrown away?

Upvotes: 5

Views: 1021

Answers (1)

Mark Adler
Mark Adler

Reputation: 112259

No. The only times in a deflate stream that you skip bits to go to a byte boundary are for a stored block (00), or when the end code in the last block is read. After the bits for the code length code lengths, you continue with the subsequent bits to use the generated Huffman code to read the code lengths.

Upvotes: 7

Related Questions