dragon66
dragon66

Reputation: 2715

What's special about TIFF 5.0 style LZW compression

I am in the middle of writing a TIFF decoder. The LZW decoder I am using works fine with all the LZW compressed GIF and TIFF images except one which will overflow the buffer of the decoded code string. I tested it with TIFFLZWDecompressor from com.sun.media.imageioimpl.plugins.tiff package and it throws the following exception "java.lang.UnsupportedOperationException: TIFF 5.0-style LZW codes are not supported".

I have been trying to find what is special about the 5.0-style LZW without success. Does anyone have any idea about this?

Note: from TIFFLZWDecompressor source code, the indicator for a TIFF 5.0-style LZW compression is the first two bytes {0x00, 0x01} of the compressed data.

Upvotes: 2

Views: 895

Answers (2)

SBS
SBS

Reputation: 836

I've bumped into the same problem recently while writing a TIFF LZW encoder. A TIFF check tool complained about "old-style LZW codes", while decoding the image properly. After some research, I found out that there has been a change in the implementation of the LZW compressor. The original ("old-style") format used exactly the same mode of operation as the GIF LZW compressor. Actually, you can use a working GIF compressor and snap it into a TIFF implementation without much effort, and it will yield files that are accepted by most TIFF readers. (One notable exception I've found was Corel PaintShop Pro X7.)

The difference between "old-style" and "new-style" applies to two encoding details:

  • LZW codes are written to the stream in reversed bit order.
  • "New-style" increases the code size one symbol earlier than "old-style" (so-called "Early Change").

Clever TIFF decoders inspect the first one or two bytes of the bit stream to detect "old-style" encoding. This is possible due to the fact that the first symbol emitted is always a clear code 0x100. If the first byte is 0x00, then those are obviously the 8 zero bits after the leading 1 bit, so it's "old-style". A "new-style" bit stream starts with the 1 bit, so the first byte is 0x01.

Upvotes: 4

Harald K
Harald K

Reputation: 27094

The TIFF 6.0 spec says:

It is also possible to implement a version of LZW in which the LZW character depth equals BitsPerSample, as described in Draft 2 of Revision 5.0. But there is a major problem with this approach. If BitsPerSample is greater than 11, we can not use 12-bit-maximum codes and the resulting LZW table is unacceptably large.

(TIFF6.pdf, pages 58-59)

It could be this is what they are referring to.

On the other hand... In my own reader I found:

NOTE: This is a spec violation. However, libTiff reads such files. TIFF 6.0 Specification, Section 13: "LZW Compression"/"The Algorithm", page 61, says: LZW compression codes are stored into bytes in high-to-low-order fashion, i.e., FillOrder is assumed to be 1. The compressed codes are written as bytes (not words) so that the compressed data will be identical whether it is an ‘II’ or ‘MM’ file."

The thing about 0x00, 0x01 is actually the "clear code" in "reverse" (ie, following the byte order, rather than ignoring it, as the spec says).

Upvotes: 2

Related Questions