Siler
Siler

Reputation: 9514

zlib inflate stream and avail_in

Part of an application I'm working on involves receiving a compressed data stream in zlib (deflate) format, piece by piece over a socket. The routine is basically to receive the compressed data in chunks, and pass it to inflate as more data becomes available. When inflate returns Z_STREAM_END we know the full object has arrived.

A very simplified version of the basic C++ inflater function is as follows:

void inflater::inflate_next_chunk(void* chunk, std::size_t size)
{
   m_strm.avail_in = size;
   m_strm.next_in = chunk;
   m_strm.next_out = m_buffer;

   int ret = inflate(&m_strm, Z_NO_FLUSH);
   /* ... check errors, etc. ... */
}

Except strangely, every like... 40 or so times, inflate will fail with a Z_DATA_ERROR.

According to the zlib manual, a Z_DATA_ERROR indicates a "corrupt or incomplete" stream. Obviously, there are any number of ways the data could be getting corrupted in my application that are way beyond the scope of this question - but after some tinkering around, I realized that the call to inflate would return Z_DATA_ERROR if m_strm.avail_in was not 0 before I set it to size. In other words, it seems that inflate is failing because there is already data in the stream before I set avail_in.

But my understanding is that every call to inflate should completely empty the input stream, meaning that when I call inflate again, I shouldn't have to worry if it didn't finish up with the last call. Is my understanding correct here? Or do I always need to check strm.avail_in to see if there is pending input?

Also, why would there ever be pending input? Why doesn't inflate simply consume all available input with with each call?

Upvotes: 3

Views: 3811

Answers (2)

Len Holgate
Len Holgate

Reputation: 21644

inflate() can return because it has filled the output buffer but not consumed all of the input data. If this happens you need to provide a new output buffer and call inflate() again until m_strm.avail.in == 0.

The zlib manual has this to say...

The detailed semantics are as follows. inflate performs one or both of the following actions:

Decompress more input starting at next_in and update next_in and avail_in accordingly. If not all input can be processed (because there is not enough room in the output buffer), next_in is updated and processing will resume at this point for the next call of inflate().

You appear to be assuming that your compressed input will always fit in your output buffer space, that's not always the case...

My wrapper code looks like this...

bool CDataInflator::Inflate(
   const BYTE * const pDataIn,
   DWORD &dataInSize,
   BYTE *pDataOut,
   DWORD &dataOutSize)
{
   if (pDataIn)
   {
      if (m_stream.avail_in == 0)
      {
         m_stream.avail_in = dataInSize;
         m_stream.next_in = const_cast<BYTE * const>(pDataIn);
      }
      else
      {
         throw CException(
            _T("CDataInflator::Inflate()"),
            _T("No space for input data"));
      }
   }

   m_stream.avail_out = dataOutSize;
   m_stream.next_out = pDataOut;

   bool done = false;

   do
   {
      int result = inflate(&m_stream, Z_BLOCK);

      if (result < 0)
      {
         ThrowOnFailure(_T("CDataInflator::Inflate()"), result);
      }

      done = (m_stream.avail_in == 0 || 
             (dataOutSize != m_stream.avail_out &&
              m_stream.avail_out != 0));
   }
   while (!done && m_stream.avail_out == dataOutSize);

   dataInSize = m_stream.avail_in;

   dataOutSize = dataOutSize - m_stream.avail_out;

   return done;
}

Note the loop and the fact that the caller relies on the dataInSize to know when all of the current input data has been consumed. If the output space is filled then the caller calls again using Inflate(0, 0, pNewBuffer, newBufferSize); to provide more buffer space...

Upvotes: 3

Alex Reynolds
Alex Reynolds

Reputation: 97004

Consider wrapping the inflate() call in a do-while loop until the stream's avail_out is not empty (i.e., some data have been extracted):

m_strm.avail_in = fread(compressed_data_buffer, 1, some_chunk_size / 8, some_file_pointer);
m_strm.next_in = compressed_data_buffer;
do {
   m_strm.avail_out = some_chunk_size;
   m_strm.next_out = inflated_data_buffer;
   int ret = inflate(&m_strm, Z_NO_FLUSH);
   /* error checking... */
} while (m_strm.avail_out == 0);
inflated_bytes = some_chunk_size - m_strm.avail_out;

Without debugging the internal workings of inflate(), I suspect it may on occasion simply need to run more than once before it can extract usable data.

Upvotes: 0

Related Questions