Matthieu M.
Matthieu M.

Reputation: 300179

How to know when the output buffer is too small when decompressing with LZ4?

The documentation of LZ4_decompress_safe says:

/*! LZ4_decompress_safe() :
    compressedSize : is the precise full size of the compressed block.
    maxDecompressedSize : is the size of destination buffer, which must be already allocated.
    return : the number of bytes decompressed into destination buffer (necessarily <= maxDecompressedSize)
             If destination buffer is not large enough, decoding will stop and output an error code (<0).
             If the source stream is detected malformed, the function will stop decoding and return a negative result.
             This function is protected against buffer overflow exploits, including malicious data packets.
             It never writes outside output buffer, nor reads outside input buffer.
*/
LZ4LIB_API int LZ4_decompress_safe (const char* source, char* dest, int compressedSize, int maxDecompressedSize);

But doesn't specify how to distinguish whether the issue is with a too small destination buffer or from malformed input/bad combination of parameters/...

In the case where I don't know what the target decompressed size is, how can I know whether I should retry with a bigger buffer, or not?

Upvotes: 4

Views: 3120

Answers (1)

Matthieu M.
Matthieu M.

Reputation: 300179

There is an issue opened about this, and for now there is no public API to distinguish between errors.


As a heuristic, looking at the code shows the possible return values:

    /* end of decoding */
    if (endOnInput)
       return (int) (((char*)op)-dest);     /* Nb of output bytes decoded */
    else
       return (int) (((const char*)ip)-source);   /* Nb of input bytes read */

    /* Overflow error detected */
_output_error:
    return (int) (-(((const char*)ip)-source))-1;

So there are only 2 cases:

  • either the decoding was successful, and you get a positive result (whose signification depends on whether you are in full or partial mode)
  • or the decoding was unsuccessful and you get a negative result

In the case of the negative result, the value is -(position_in_input + 1).

This suggests that guessing whether the destination buffer was too small can be accomplished with a good likelihood of success by retrying with a (much) bigger buffer, and checking whether the failure occurs in the same position:

  • if the second decompression attempt succeeds, you're good!
  • if the second decompression attempt fails at the same position, then the issue is likely with the input,
  • otherwise, you have to try with a bigger buffer again.

Or otherwise said, as long as the result differs, try again, otherwise, there's your result.


Limitation

The input pointer does not necessarily advance one byte at a time, it may advance length bytes in two places where length is read from the input and unbounded.

If decoding fails because the output buffer was too small, and the new output buffer is still too small for length, then decoding will fail in the same position even though the input is not (necessarily) malformed.

If false positives are an issue, then one may attempt to:

  • decode the length, by checking the input stream at the position returned,
  • simply allocate 255 * <input size> - 2526 as per Mark Adler's answer, which is reasonable for small inputs.

Upvotes: 4

Related Questions