hasuchobe
hasuchobe

Reputation: 21

Using boost filtering_streambuf on large data

I'm trying to compress some data using boost gzip compression via filtering_streambuf. The compressed version is then written to disc. The problem is the data is over 10GB in size and I believe stringstream is running out of space. Assuming I can break this data up into pieces, what's the right way of using stringstream and filtering_streambuf to compress all my data?

I've tried breaking up the data into pieces where I set the max chunk size to std::string::max_size()/2 and pushing several stringstream objects to the filtering_streambuf object but that doesn't seem to be how filtering_streambuf works :) I've also tried copying each chunk of data using bio::copy() repeatedly. I've attached a sample code that isn't my exact code (don't have access to it atm) but the idea is the same except compressed is a filestream. It's possible something I mentioned actually works and I just have a bug in my code but if that's the case then I'll find the bug. Just need to know what's considered the correct approach for compressing a large chunk of data.

EDIT: Added actual code I've written. For some reason, this doesn't compile because write is not a valid function? Also, can't declare filtering_ostream either. Maybe this version of boost is old? The variables being written are chars.

boost::iostreams::filtering_streambuf<boost::iostreams::output> out;
out.push(boost::iostreams::gzip_compressor());
out.push(boost::iostreams::file_sink(fileName.c_str()));

out.write(&sizeof_sizet, 1);
out.write(&sizeof_int, 1);
out.write(&sizeof_double, 1);
out.write(&sizeof_Int, 1);

EDIT 2: This might be what I'm trying to achieve. Compiles but didn't test yet.

boost::iostreams::filtering_ostreambuf buf;
buf.push(boost::iostreams::gzip_compressor());
buf.push(boost::iostreams::file_sink(fileName.c_str()));

std::ostream out(&buf);

out.write(&sizeof_sizet, 1);
out.write(&sizeof_int, 1);
out.write(&sizeof_double, 1);
out.write(&sizeof_Int, 1);

Upvotes: 2

Views: 2435

Answers (1)

Dan Mašek
Dan Mašek

Reputation: 19041

Use a filtering_stream instead of filtering_streambuf and write directly to a file to avoid having to buffer the entire compressed result in memory until completion.

#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/filtering_stream.hpp>

#include <boost/iostreams/filter/gzip.hpp>

int main()
{
    boost::iostreams::filtering_ostream out;
    out.push(boost::iostreams::gzip_compressor());
    out.push(boost::iostreams::file_sink("test.gz"));

    std::string test_string("FOO BAR BAZ....\n");

    out.write(test_string.c_str(), test_string.size() + 1);
}

I can run it, and then try to decompress the file it created:

>ls test.gz
ls: test.gz: No such file or directory

>test.exe

>ls test.gz
test.gz

>gzip -cd test.gz
FOO BAR BAZ....

Upvotes: 2

Related Questions