Reputation: 993
I have the following C++ code which uses zlib to compress a memory buffer into a gzip encoded stream:
void compress(const std::vector<char>& src)
{
static constexpr int DEFAULT_WINDOW_BITS = 15;
static constexpr int GZIP_WINDOW_BITS = DEFAULT_WINDOW_BITS + 16;
static constexpr int GZIP_MEM_LEVEL = 8;
z_stream stream;
const auto srcData = reinterpret_cast<unsigned char*>(const_cast<char*>(src.data()));
stream.zalloc = Z_NULL;
stream.zfree = Z_NULL;
stream.opaque = Z_NULL;
stream.next_in = srcData;
stream.avail_in = src.size();
auto result = deflateInit2(&stream,
Z_DEFAULT_COMPRESSION,
Z_DEFLATED,
GZIP_WINDOW_BITS,
GZIP_MEM_LEVEL,
Z_DEFAULT_STRATEGY);
if (result == Z_OK)
{
std::vector<char> dest(deflateBound(&stream, stream.avail_in));
const auto destData = reinterpret_cast<unsigned char*>(dest.data());
stream.next_out = destData;
stream.avail_out = dest.size();
result = deflate(&stream, Z_FINISH);
if (result == Z_STREAM_END)
{
std::cout << "Original: " << src.size() << "; compressed: " << dest.size() << std::endl;
}
else
{
std::cerr << "Error when compressing: code " << std::to_string(result);
}
result = deflateEnd(&stream);
if (result != Z_OK)
{
std::cerr << "Error: Cannot destroy deflate stream: code " << std::to_string(result) << std::endl;
}
}
else
{
std::cerr << "Error: Cannot initialize deflate stream: code " << std::to_string(result) << std::endl;
}
}
While the function finishes successfully, I'm getting no compression at all. In fact, for a 3MB file consisting of just the character 'a' repeated multiple times, I get the following:
Original: 3205841; compressed: 3206843
Am I doing something wrong?
(Notice that this is a simplified version of the original code; in practice, I'd be using RAII and exceptions for resource and error handling).
Upvotes: 1
Views: 552
Reputation: 112239
The comments on the question are the answers, so just to record them here for posterity ...
dest.size()
does not, and cannot be changed by deflate()
. All you get from dest.size()
is the size of your output buffer before compression. You need to look at something that comes back from the deflate()
call in order to determine the size of the compressed result. That can be either dest.size() - strm.avail_out
, or strm.total_out
.
Doing the compression in a single call means that you need to fit both the input and output buffer sizes in an unsigned
, which is usually 32 bits. So you are limited to compressing about 4 GB of data. If you might need to do more, then you would need a loop, calling deflate()
for smaller chunks. Possibly much smaller chunks, measured in 10's or 100's of Kbytes. That is the usual way to use deflate()
, since it takes much less memory and keeps your routine from being a resource hog in that regard.
deflateBound()
is specifically for the purpose of supporting the use of a single deflate()
call. It provides an upper bound on the possible compressed size, which can be a smidge larger than the input data. That is the case when the input data is incompressible, e.g. already compressed or random.
Upvotes: 2