Reputation: 9514
Part of an application I'm working on involves receiving a compressed data stream in zlib (deflate) format, piece by piece over a socket. The routine is basically to receive the compressed data in chunks, and pass it to inflate
as more data becomes available. When inflate
returns Z_STREAM_END
we know the full object has arrived.
A very simplified version of the basic C++ inflater function is as follows:
void inflater::inflate_next_chunk(void* chunk, std::size_t size)
{
m_strm.avail_in = size;
m_strm.next_in = chunk;
m_strm.next_out = m_buffer;
int ret = inflate(&m_strm, Z_NO_FLUSH);
/* ... check errors, etc. ... */
}
Except strangely, every like... 40 or so times, inflate
will fail with a Z_DATA_ERROR
.
According to the zlib manual, a Z_DATA_ERROR
indicates a "corrupt or incomplete" stream. Obviously, there are any number of ways the data could be getting corrupted in my application that are way beyond the scope of this question - but after some tinkering around, I realized that the call to inflate
would return Z_DATA_ERROR
if m_strm.avail_in
was not 0
before I set it to size
. In other words, it seems that inflate
is failing because there is already data in the stream before I set avail_in
.
But my understanding is that every call to inflate
should completely empty the input stream, meaning that when I call inflate
again, I shouldn't have to worry if it didn't finish up with the last call. Is my understanding correct here? Or do I always need to check strm.avail_in
to see if there is pending input?
Also, why would there ever be pending input? Why doesn't inflate
simply consume all available input with with each call?
Upvotes: 3
Views: 3811
Reputation: 21644
inflate()
can return because it has filled the output buffer but not consumed all of the input data. If this happens you need to provide a new output buffer and call inflate()
again until m_strm.avail.in == 0
.
The zlib manual has this to say...
The detailed semantics are as follows. inflate performs one or both of the following actions:
Decompress more input starting at next_in and update next_in and avail_in accordingly. If not all input can be processed (because there is not enough room in the output buffer), next_in is updated and processing will resume at this point for the next call of inflate().
You appear to be assuming that your compressed input will always fit in your output buffer space, that's not always the case...
My wrapper code looks like this...
bool CDataInflator::Inflate(
const BYTE * const pDataIn,
DWORD &dataInSize,
BYTE *pDataOut,
DWORD &dataOutSize)
{
if (pDataIn)
{
if (m_stream.avail_in == 0)
{
m_stream.avail_in = dataInSize;
m_stream.next_in = const_cast<BYTE * const>(pDataIn);
}
else
{
throw CException(
_T("CDataInflator::Inflate()"),
_T("No space for input data"));
}
}
m_stream.avail_out = dataOutSize;
m_stream.next_out = pDataOut;
bool done = false;
do
{
int result = inflate(&m_stream, Z_BLOCK);
if (result < 0)
{
ThrowOnFailure(_T("CDataInflator::Inflate()"), result);
}
done = (m_stream.avail_in == 0 ||
(dataOutSize != m_stream.avail_out &&
m_stream.avail_out != 0));
}
while (!done && m_stream.avail_out == dataOutSize);
dataInSize = m_stream.avail_in;
dataOutSize = dataOutSize - m_stream.avail_out;
return done;
}
Note the loop and the fact that the caller relies on the dataInSize
to know when all of the current input data has been consumed. If the output space is filled then the caller calls again using Inflate(0, 0, pNewBuffer, newBufferSize);
to provide more buffer space...
Upvotes: 3
Reputation: 97004
Consider wrapping the inflate()
call in a do-while
loop until the stream's avail_out
is not empty (i.e., some data have been extracted):
m_strm.avail_in = fread(compressed_data_buffer, 1, some_chunk_size / 8, some_file_pointer);
m_strm.next_in = compressed_data_buffer;
do {
m_strm.avail_out = some_chunk_size;
m_strm.next_out = inflated_data_buffer;
int ret = inflate(&m_strm, Z_NO_FLUSH);
/* error checking... */
} while (m_strm.avail_out == 0);
inflated_bytes = some_chunk_size - m_strm.avail_out;
Without debugging the internal workings of inflate()
, I suspect it may on occasion simply need to run more than once before it can extract usable data.
Upvotes: 0