How LZMA decompression algorithm works?

Question

I am stuck of understanding how lzma decompress algorithm works, more precisely this function

int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, 
        const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);

which is in an infinite loop, with a fixed size for SizeT *destLen and SizeT *srcLen which have equal size, logically the destLen(uncompressed data) should be greater than srcLen(compressed data) with a predefined ratio, I didn't get how it works with an equal size or it could accept any size then it had a temp buff that will store data then and treat them with it's method.

Codo · Accepted Answer

This function is difficult to use indeed. It is built for incremental decompression. Thus the envisioned use case is like so:

A program reads compressed data from a source like a file. The data is read in chunks, e.g. 1000 bytes at a time.
Each chunk is decompressed into a buffer. The buffer is usually bigger, e.g. 2000 bytes. After decompressing a chunk, the destination buffer will be partially filled with data, e.g. 1738 bytes.
For each chunk, the decompressed data will be further processed, e.g. written to a file.

So simplified code for decompressing a file and writing the result to another file looks like so:


file_in = open_file(...);
file_out = open_file(...);
while (!eof(file_in)) {
   len_in = read(file_in, buf_in, 1000);
   len_out = decompress(buf_in,len_in, buf_out, 2000);
   write(file_out, buf_out, len_out);
}
close(file_in);
close(file_out);

So the relevant parameters of LzmaDec_DecodeToBuf are:

dest: pointer to destination buffer
destLen: pointer to destination length. The caller sets the length to the allocated buffer size. LzmaDec_DecodeToBuf updates the length to contain the length of the decompressed data.
src: pointer to source buffer
srcLen: pointer to source length. The caller sets the length to the length of the compressed data in the buffer. LzmaDec_DecodeToBuf updates the length to contain the number of processed bytes.

srLen can be tricky. In most cases, all data passed to the function will be processed. But in case the decompressed data does not fit into the destination buffer, it will only process part of the data. So the decompressed data needs to be written to the file to free up the destination buffer and the function needs to be called again with the remaining data. This has been omitted from the simplified code.

An additional omission is the finalization. At the end of the input file, LzmaDec_DecodeToBuf might need to be called one last time with a different finish mode.

How LZMA decompression algorithm works?

Answers (1)

Related Questions