user11914177
user11914177

Reputation: 955

Reading and writing files fast in C++

I'm trying to read and write a few megabytes of data stored in files, consisting out of 8 floats converted to strings per line, to my SSD. Looking up C++ code and implementing some of the answers here for reading and writing files yielded me this code for reading a file:

std::stringstream file;
std::fstream stream;
stream.open("file.txt", std::fstream::in);
file << stream.rdbuf();
stream.close(); 

And this code for writing files:

stream.write(file.str().data(), file.tellg());

The problem is, that this code is very slow, compared to the speed of my SSD. My SSD has a reading speed of 2400 MB/s and a writing speed of 1800 MB/s. But my program has a read speed of only 180.6 MB/s and a write speed of 25.11 MB/s.

Because some asked how I measure the speed, I obtain a std::chrono::steady_clock::time_point using std::chrono::steady_clock::now() and then do a std::chrono::duration_cast. Using the same 5.6MB large file and dividing the file size by the measured time, I get the megabytes per second.

How can I increase the speed of reading and writing to files, while using only standard C++ and STL?

Upvotes: 0

Views: 6238

Answers (4)

Thomas Matthews
Thomas Matthews

Reputation: 57678

Looks like you are outputting formatted numbers to a file. There are two bottlenecks already: formatting the numbers into human readable form and the file I/O.

The best performance you can achieve is to keep the data flowing. Starting and stopping requires overhead penalties.

I recommend double buffering with two or more threads.

One thread formats the data into one or more buffers. Another thread writes the buffers to the file. You'll need to adjust the size and quantity of buffers to keep the data flowing. When one thread finishes a buffer, the thread starts processing another buffer. For example, you could have the writing thread use fstream.write() to write the entire buffer.

The double buffering with threads can also be adapted for reading. One thread reads the data from the file into one or more buffers and another thread formats the data (from the buffers) into internal format.

Upvotes: 0

Ted Lyngmo
Ted Lyngmo

Reputation: 117168

You can try to copy the whole file at once and see if that improves the speed:

#include <algorithm>
#include <fstream>
#include <iterator>

int main() {
    std::ifstream is("infile");
    std::ofstream os("outfile");

    std::copy(std::istreambuf_iterator<char>(is), std::istreambuf_iterator<char>{},
              std::ostreambuf_iterator<char>(os));

    // or simply: os << is.rdbuf()
}

Upvotes: 2

A M
A M

Reputation: 15277

I made a short evaluation for you.

I have written a test program, that first creates a test file.

Then I did several improvement methods:

  1. I switch on all compiler optimizations
  2. For the string, i use resize to avoid reallocations
  3. Reading from the stream is drastically improved by setting a bigger input buffer

Please see and check, if you can implement one of my ideas for your solution


Edit

Strip down test program to pure reading:

#include <string>
#include <iterator>
#include <iostream>
#include <fstream>
#include <chrono>
#include <algorithm>

constexpr size_t NumberOfExpectedBytes = 80'000'000;
constexpr size_t SizeOfIOStreamBuffer = 1'000'000;
static char ioBuffer[SizeOfIOStreamBuffer];

const std::string fileName{ "r:\\log.txt" };

void writeTestFile() {
    if (std::ofstream ofs(fileName); ofs) {
        for (size_t i = 0; i < 2'000'000; ++i)
            ofs << "text,text,text,text,text,text," << i << "\n";
    }
}


int main() {

    //writeTestFile();

    // Make string with big buffer
    std::string completeFile{};
    completeFile.resize(NumberOfExpectedBytes);

    if (std::ifstream ifs(fileName); ifs) {

        // Increase buffer size for buffered input
        ifs.rdbuf()->pubsetbuf(ioBuffer, SizeOfIOStreamBuffer);

        // Time measurement start
        auto start = std::chrono::system_clock::now();

        // Read complete file
        std::copy(std::istreambuf_iterator<char>(ifs), {}, completeFile.begin());

        // Time measurement evaluation
        auto end = std::chrono::system_clock::now();
        auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
        // How long did it take?
        std::cout << "Elapsed time:       " << elapsed.count() << " ms\n";
    }
    else std::cerr << "\n*** Error.  Could not open source file\n";

    return 0;
}

With that I do achieve 123,2MB/s

Upvotes: 2

3Dave
3Dave

Reputation: 29041

In your sample, the slow part is likely the repeated calls to getline(). While this is somewhat implementation-dependent, typically a call to getline eventually boils down to an OS call to retrieve the next line of text from an open file. OS calls are expensive, and should be avoided in tight loops.

Consider a getline implementation that incurs ~1ms of overhead. If you call it 1000 times, each reading ~80 characters, you've acquired a full second of overhead. If, on the other hand, you call it once and read 80,000 characters, you've removed 999ms of overhead and the function will likely return nearly instantaneously.

(This is also one reason games and the like implement custom memory management rather than just malloc and newing all over the place.)

For reading: Read the entire file at once, if it'll fit in memory.

See: How do I read an entire file into a std::string in C++?

Specifically, see the slurp answer towards the bottom. (And take to heart the comment about using a std::vector instead of a char[] array.)

If it won't all fit in memory, manage it in large chunks.

For writing: build your output in a stringstream or similar buffer, and then write it one step, or in large chunks to minimize the number of OS round trips.

Upvotes: 1

Related Questions