Ryan Laursen
Ryan Laursen

Reputation: 647

What exactly to do at max code table size followed by clear code read in a GIF LZW decompression algorithm?

I've been working on learning how to encode and decode GIF image files. Here is my test image:

Here is my test image:

Code with problem replaced with return, stops before error:

from enum import Enum

class Codes(Enum):
    CLEAR = 0
    EOI = 1

def decode(code_bytes: bytes):
    """Return decoded gif from incoming bytes, currently only works with entire bytes object"""
    lzw_min, leng, *_ = code_bytes
    total = 2 + leng

    # Skip lzw_min and sub-block size indicator bytes
    skips = {0, 1, total}
    while leng != 1:
        leng = code_bytes[total] + 1
        total += leng
        skips |= {total}

    def _stream(skips):
        """Return least significant bit still available from current byte"""
        for i, byte in enumerate(code_bytes):
            if i in skips:
                continue
            for _ in range(8):
                yield byte & 1 and 1
                byte >>= 1

    code_stream = _stream(skips)

    def get_code(bits, x=0, s=0):
        """Retrieve bits and assemble in proper order"""
        for n in range(s, bits):
            x |= next(code_stream) << n
        return x

    code_table = {
            **{i: [i] for i in range(2 ** lzw_min)},
            2 ** lzw_min: [Codes.CLEAR], 
            2 ** lzw_min + 1: [Codes.EOI]
    }
    bits = lzw_min + 1
    assert Codes.CLEAR in code_table[get_code(bits)]  # First code is always CLEAR
    table_index = init_table = len(code_table) - 1
    last = get_code(bits)
    code = get_code(bits)
    yield last
    while Codes.EOI not in code_table.get(code, []):

        if code <= table_index:
            for i in code_table[code]:
                yield i
            table_index += 1
            code_table[table_index] = code_table[last] + [code_table[code][0]]

        else:
            k = code_table[last][0]
            for i in code_table[last] + [k]:
                yield i
            table_index += 1
            code_table[table_index] = code_table[last] + [k]

        if table_index == 2**bits-1:
            bits += 1

        # Problem replaced here
        if bits == 13:
            return

        last, code = code, get_code(bits)

Output from above with the second frame of GIF:

enter image description here

Currently, output from the same input with problem included:

test image output, only the top half

So here is where the problem lies:

        if bits == 13:
            last, code = code, get_code(bits)
            bits = lzw_min + 1
            table_index = init_table
            assert Codes.CLEAR in code_table[code]

While reading the image, when table_index reaches one under the max bitlength and increments bits past the max bitlength, this code executes. It apparently isn't functioning properly.

After reaching max bit size, a CLEAR code is read next, as expected, no AssertionError is raised.

From what I've read, a CLEAR code means reset bitlength to lzw_min + 1 and re-initialize the code table. I believe that's what my code does.

Originally, code_table was a list and was reset after the CLEAR and bitlength reset, but codes much larger than the current table index kept being output from the stream. making code table a dict allows currently un-overwritten codes to be accessed after re-initializing, but this just produces noise. Apparently, some of the noise eventually registers as EOI and parsing ends.

What is wrong with what this algorithm does after encountering a maximum size code followed by a CLEAR?


Recreating my issue (requires pillow):

Here are the second frame of the test gif as a bytes literal you may use for function input and the frame's palette as a list literal for building the img:

bytes: (removed from pastebin)

palette:

[(230, 226, 225), (54, 28, 25), (99, 25, 28), (117, 22, 28), (65, 39, 33), (79, 45, 38), (92, 39, 36), (81, 45, 39), (79, 48, 40), (88, 50, 43), (100, 42, 38), (104, 60, 50), (111, 66, 55), (119, 68, 57), (138, 11, 23), (134, 14, 24), (139, 14, 25), (151, 9, 23), (148, 13, 26), (156, 12, 26), (132, 18, 27), (141, 18, 29), (149, 19, 30), (166, 7, 23), (173, 7, 24), (164, 11, 26), (172, 11, 27), (184, 4, 23), (181, 7, 25), (190, 6, 25), (180, 10, 27), (191, 9, 28), (166, 16, 30), (142, 22, 33), (142, 25, 35), (151, 23, 35), (146, 27, 37), (147, 30, 41), (189, 14, 33), (169, 23, 37), (180, 20, 37), (180, 23, 39), (188, 18, 36), (188, 26, 43), (149, 35, 45), (155, 36, 42), (151, 41, 51), (153, 43, 53), (134, 63, 56), (151, 48, 57), (156, 51, 60), (166, 40, 52), (173, 42, 51), (184, 39, 54), (167, 53, 55), (192, 7, 26), (192, 10, 29), (193, 14, 33), (194, 19, 37), (196, 26, 39), (195, 22, 40), (195, 27, 44), (196, 31, 48), (199, 34, 45), (197, 36, 52), (201, 42, 53), (198, 42, 58), (200, 45, 60), (202, 51, 60), (131, 75, 63), (175, 67, 61), (159, 55, 64), (157, 59, 67), (161, 59, 68), (170, 58, 69), (185, 55, 68), (202, 52, 67), (199, 56, 70), (204, 60, 75), (208, 62, 64), (148, 81, 71), (164, 68, 76), (171, 69, 76), (183, 70, 73), (189, 86, 75), (165, 75, 83), (169, 76, 84), (184, 75, 85), (170, 90, 83), (168, 84, 91), (172, 84, 91), (169, 99, 82), (177, 104, 86), (182, 106, 88), (172, 90, 97), (181, 92, 99), (173, 102, 108), (181, 100, 107), (183, 109, 114), (186, 118, 123), (198, 67, 72), (205, 65, 78), (206, 77, 73), (206, 69, 82), (202, 74, 87), (207, 73, 86), (208, 74, 87), (208, 77, 90), (199, 85, 90), (215, 91, 82), (209, 81, 94), (199, 104, 87), (218, 106, 92), (195, 114, 93), (203, 114, 92), (212, 112, 95), (201, 90, 100), (210, 85, 97), (211, 90, 101), (212, 94, 105), (199, 99, 108), (213, 98, 109), (202, 119, 99), (212, 122, 99), (215, 121, 100), (217, 125, 102), (201, 105, 114), (214, 102, 113), (215, 106, 117), (215, 109, 119), (216, 111, 120), (199, 116, 123), (212, 115, 124), (217, 115, 124), (225, 121, 103), (221, 130, 107), (226, 133, 110), (228, 135, 112), (230, 136, 113), (232, 138, 114), (224, 140, 121), (184, 125, 129), (198, 124, 130), (215, 122, 130), (219, 123, 132), (185, 133, 137), (189, 143, 146), (190, 149, 152), (191, 160, 162), (198, 132, 137), (215, 135, 141), (221, 131, 139), (201, 141, 145), (212, 141, 146), (222, 139, 146), (203, 150, 154), (214, 150, 154), (226, 153, 135), (224, 142, 149), (224, 144, 151), (225, 148, 155), (226, 153, 159), (224, 169, 156), (200, 158, 161), (214, 158, 162), (227, 156, 162), (200, 167, 169), (210, 162, 165), (216, 166, 169), (213, 170, 173), (201, 175, 177), (216, 174, 176), (205, 184, 185), (218, 179, 180), (215, 182, 184), (221, 187, 188), (228, 161, 166), (229, 165, 170), (230, 170, 174), (231, 173, 177), (232, 174, 178), (232, 178, 181), (227, 183, 185), (233, 182, 185), (234, 186, 189), (212, 191, 192), (228, 191, 193), (235, 190, 193), (207, 203, 202), (211, 195, 196), (212, 205, 204), (218, 201, 201), (212, 208, 207), (214, 210, 210), (216, 212, 212), (219, 216, 215), (222, 218, 217), (226, 195, 196), (236, 195, 197), (228, 199, 200), (237, 199, 200), (229, 203, 203), (238, 203, 204), (238, 207, 208), (240, 207, 208), (230, 213, 213), (235, 212, 212), (226, 221, 220), (237, 220, 219), (241, 211, 212), (241, 215, 216), (242, 219, 219), (228, 224, 223), (239, 224, 223), (242, 224, 223), (230, 226, 224), (234, 229, 228), (237, 232, 231), (239, 234, 232), (244, 228, 227), (245, 232, 231), (244, 237, 235), (248, 239, 237), (246, 241, 239), (248, 241, 239), (247, 242, 240), (248, 242, 240), (250, 248, 246), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0)]

Here is the code needed to produce the same outputs as me with these literals:

from PIL import Image
w, h = 380, 407
gifo = Image.new('RGB', (w, h))
gif = gifo.load()
pix = decode(FRAME_2)  # bytes literal as input
palette = PALETTE      # palette literal as input
try:
    for y in range(h):
        for x in range(w):
            t = next(pix)
            while t in {Codes.EOI, Codes.CLEAR}: # Only needed to handle buggy noise with codes mixed into it
                    t = next(pix)
            gif[x, y] = palette[t]
except StopIteration:
    ...
gifo.show()

Resources I've used in this project:

http://giflib.sourceforge.net/whatsinagif/lzw_image_data.html

https://www.daubnet.com/en/file-format-gif

https://www.w3.org/Graphics/GIF/spec-gif89a.txt

https://www.eecis.udel.edu/~amer/CISC651/lzw.and.gif.explained.html

Regarding deferred CLEAR codes, though this gif does not contain these and I did not include logic for them in this question:

https://halicery.com/Image%20Decoders/GIF/GIF-LZW%20Deferred%20Clear%20Code.html

Upvotes: 1

Views: 420

Answers (1)

Ryan Laursen
Ryan Laursen

Reputation: 647

I figured it out. Among other things, I was pulling a 13 bit CLEAR code instead of 12 bit and I didn't cycle to the next data when initializing a table properly. This version works great.

BYTE = 8
MAX_BITLEN = 12

def decode(code_bytes):
    """Yield decoded gif-style LZW from incoming bytes."""
    bytes_iter = iter(code_bytes)
    lzw_min = next(bytes_iter)
    get_code = _get_code_getter(bytes_iter)
    code = clear = get_code(lzw_min + 1)
    eoi = clear + 1
    while code != eoi:
        if code == clear:
            bitlength = lzw_min + 1
            code_table = [[i] for i in range(eoi + 1)]
            last, code = get_code(bitlength), get_code(bitlength)
            yield last
        if code < len(code_table):
            output = code_table[code]
            to_add = code_table[last] + [code_table[code][0]]
        else:
            to_add = output = code_table[last] + [code_table[last][0]]
        for i in output:
            yield i
        code_table += [to_add]
        bitlength += len(code_table) == (bitlength < MAX_BITLEN) << bitlength
        last, code = code, get_code(bitlength)


def _get_code_getter(code_iter):
    def bit_stream():
        """Yield least significant bit remaining in current byte."""
        length = next(code_iter)
        for read, byte in enumerate(code_iter):
            if read == length:
                length += byte + 1
            else:
                for bit in (1 << b & byte and 1 for b in range(BYTE)):
                    yield bit

    def code_getter(bitlength):
        """Retrieve/structure bits, least significant first."""
        return sum(next(code_stream) << z for z in range(bitlength))

    code_stream = bit_stream()
    return code_getter

Upvotes: 2

Related Questions