DavidZi
DavidZi

Reputation: 335

rdbuf()->pubsetbuf() using a bidirectional fstream is applied only to writes

I am trying to implement fast processing of large files using the Visual Studio 2019. Data should be read, processed and then written to the end of the same file. After making some tests, I found that a file buffer of 1MB seems to be a best option on my hardware.

Here, I'm trying to set it to 1MB:

#include <fstream>
#include <array>
#include <memory>

using namespace std;

int main()
{
    const streamsize BUFFER_SIZE = 1 * 1024 * 1024;
    unique_ptr<::array<char, BUFFER_SIZE>> buffer = make_unique<::array<char, BUFFER_SIZE>>();

    const streamsize FILE_BUFFER_SIZE = 1 * 1024 * 1024;
    unique_ptr<::array<char, FILE_BUFFER_SIZE>> file_buffer = make_unique<array<char, FILE_BUFFER_SIZE>>();

    ios::sync_with_stdio(false);

    fstream stream;
    stream.rdbuf()->pubsetbuf(file_buffer->data(), file_buffer->size());
    stream.open(R"(C:\test\test_file.bin)", ios::in | ios::out | ios::binary);

    while (stream.good())
    {
        stream.read(buffer->data(), buffer->size());

        // Some data processing and writes here
    }   
}

While monitoring the program using the Sysinternals' ProcessMonitor, I can see that the WriteFile function is called with 1MB buffer indeed, but the ReadFile function is called 256 times for one loop iteration with only a 4K buffer. This leads to a much worse performance.

I've googled this problem and found no similar cases. I would appreciate any help on this.

Upvotes: 0

Views: 510

Answers (2)

Alan Birtles
Alan Birtles

Reputation: 36459

The behaviour of setbuf isn't very well specified: https://en.cppreference.com/w/cpp/io/basic_filebuf/setbuf

According to cppreference (which matches my experience) libstdc++ only uses the buffer if you call pubsetbuf before opening the file, visual studio only uses the buffer if passed after opening the file. Therefore for cross platform code which has a resonable chance (but no guarantee) of using your buffer you should do:

fstream stream;
stream.rdbuf()->pubsetbuf(file_buffer->data(), file_buffer->size());
stream.open(R"(C:\test\test_file.bin)", ios::in | ios::out | ios::binary);
stream.rdbuf()->pubsetbuf(file_buffer->data(), file_buffer->size());

Also note you don't need to actually supply a buffer to pubsetbuf, you can just pass a null pointer:

fstream stream;
stream.rdbuf()->pubsetbuf(nullptr, BUFFER_SIZE);
stream.open(R"(C:\test\test_file.bin)", ios::in | ios::out | ios::binary);
stream.rdbuf()->pubsetbuf(nullptr, BUFFER_SIZE);

If you want to target libstdc++ in the future it is also worth noting that your buffer size needs to be 1 larger than your desired size.

boost::iostreams gives you a little more direct control over buffer sizes.

Upvotes: 1

Joshua Clayton
Joshua Clayton

Reputation: 1729

What you probably want is a memory mapped file, which is cached. You work against the buffered version of the file in memory, and it is eventually synchronized with the actual disk.

Here is a similar question answered., Is there a memory mapping api on windows platform, just like mmap() on linux?

Upvotes: 0

Related Questions