Reputation: 335
I am trying to implement fast processing of large files using the Visual Studio 2019. Data should be read, processed and then written to the end of the same file. After making some tests, I found that a file buffer of 1MB seems to be a best option on my hardware.
Here, I'm trying to set it to 1MB:
#include <fstream>
#include <array>
#include <memory>
using namespace std;
int main()
{
const streamsize BUFFER_SIZE = 1 * 1024 * 1024;
unique_ptr<::array<char, BUFFER_SIZE>> buffer = make_unique<::array<char, BUFFER_SIZE>>();
const streamsize FILE_BUFFER_SIZE = 1 * 1024 * 1024;
unique_ptr<::array<char, FILE_BUFFER_SIZE>> file_buffer = make_unique<array<char, FILE_BUFFER_SIZE>>();
ios::sync_with_stdio(false);
fstream stream;
stream.rdbuf()->pubsetbuf(file_buffer->data(), file_buffer->size());
stream.open(R"(C:\test\test_file.bin)", ios::in | ios::out | ios::binary);
while (stream.good())
{
stream.read(buffer->data(), buffer->size());
// Some data processing and writes here
}
}
While monitoring the program using the Sysinternals' ProcessMonitor, I can see that the WriteFile function is called with 1MB buffer indeed, but the ReadFile function is called 256 times for one loop iteration with only a 4K buffer. This leads to a much worse performance.
I've googled this problem and found no similar cases. I would appreciate any help on this.
Upvotes: 0
Views: 510
Reputation: 36459
The behaviour of setbuf
isn't very well specified: https://en.cppreference.com/w/cpp/io/basic_filebuf/setbuf
According to cppreference (which matches my experience) libstdc++ only uses the buffer if you call pubsetbuf
before opening the file, visual studio only uses the buffer if passed after opening the file. Therefore for cross platform code which has a resonable chance (but no guarantee) of using your buffer you should do:
fstream stream;
stream.rdbuf()->pubsetbuf(file_buffer->data(), file_buffer->size());
stream.open(R"(C:\test\test_file.bin)", ios::in | ios::out | ios::binary);
stream.rdbuf()->pubsetbuf(file_buffer->data(), file_buffer->size());
Also note you don't need to actually supply a buffer to pubsetbuf
, you can just pass a null pointer:
fstream stream;
stream.rdbuf()->pubsetbuf(nullptr, BUFFER_SIZE);
stream.open(R"(C:\test\test_file.bin)", ios::in | ios::out | ios::binary);
stream.rdbuf()->pubsetbuf(nullptr, BUFFER_SIZE);
If you want to target libstdc++ in the future it is also worth noting that your buffer size needs to be 1 larger than your desired size.
boost::iostreams gives you a little more direct control over buffer sizes.
Upvotes: 1
Reputation: 1729
What you probably want is a memory mapped file, which is cached. You work against the buffered version of the file in memory, and it is eventually synchronized with the actual disk.
Here is a similar question answered., Is there a memory mapping api on windows platform, just like mmap() on linux?
Upvotes: 0