Sebastian
Sebastian

Reputation: 21

Problem with handling large data using Boost iostreams

I need to handle large amount of data in memory (without using files/fstream) and I know that VS implementation of streambuf doesn't allow for that as it uses 32-bit counter (https://github.com/microsoft/STL/issues/388). I thought that maybe Boost could help me, but apparently it doesn't handle that properly as well (or maybe I'm missing something).

#include <vector>
#include <iostream>
#include <boost/iostreams/stream.hpp>

namespace bs = boost::iostreams;

int main()
{
    uint64_t mb1 = 1024 * 1024;
    uint64_t gb1 = 1024 * mb1;
    uint64_t mbToCopy = 2048;

    std::vector<char> iBuffer(mb1);
    std::vector<char> oBuffer(4 * gb1);
    bs::stream<bs::array_sink> oStr(oBuffer.data(), oBuffer.size());

    for (int i = 0; i < mbToCopy; i++) {
        oStr.write(iBuffer.data(), iBuffer.size());
    }
    std::cout << oStr.tellp() << std::endl; // (1)
    oStr.seekp(0, std::ios_base::beg);
    std::cout << oStr.tellp() << std::endl; // (2)
}

This code works fine as long as mbToCopy is not bigger than 2048 and the output is:

2147483648
0

When I change mbToCopy to 2049 the output is:

2148532224
4294967296

As you can see, when I try to move back to the beginning of the stream (this is example usage, but I need to be able to reposition to any place in the stream) it places me way beyond the current size of the stream and stream becomes unreliable. What's more, when I keep mbToCopy set to 2049 and reduce the size of oBuffer to 3GB oStr.seekp starts crashing.

Any idea if Boost provides other solutions that could help in my case?

Upvotes: 1

Views: 82

Answers (1)

sehe
sehe

Reputation: 392833

I would suggest not using streams here at all. They seem to introduce unncessary overhead:

#include <cassert>
#include <iostream>
#include <vector>

static inline auto operator""_kb(unsigned long long v) { return v << 10ull; }
static inline auto operator""_mb(unsigned long long v) { return v << 20ull; }
static inline auto operator""_gb(unsigned long long v) { return v << 30ull; }

int main()
{
    std::vector<char> iBuffer(1_mb);
    std::vector<char> oBuffer(12_gb);

    auto pos = oBuffer.begin();
    for (size_t i = 0; i < 8192; i++) {
        assert(std::next(pos, iBuffer.size()) <= oBuffer.end());
        pos = std::copy_n(iBuffer.begin(), iBuffer.size(), pos);
    }

    auto tellp = [&] { return std::distance(oBuffer.begin(), pos); };
    auto seekp = [&](size_t from_beg) { pos = std::next(oBuffer.begin(), from_beg); };

    std::cout << tellp() << std::endl; // (1)
    seekp(0);
    std::cout << tellp() << std::endl; // (2)
}

Which on my system prints, without a concern:

8589934592
0

Of course I introduced the tellp()/seekp() helpers only to make the code as similar as possible. You could also just write:

auto const beg = oBuffer.begin();
auto       pos = beg;

for (size_t i = 0; i < 8192; i++) {
    assert(std::next(pos, iBuffer.size()) <= oBuffer.end());
    pos = std::copy_n(iBuffer.begin(), iBuffer.size(), pos);
}

std::cout << (pos-beg) << std::endl; // (1)
pos = beg;
std::cout << (pos-beg) << std::endl; // (2)

With exactly the same output.

Upvotes: 0

Related Questions