Why is multi-threaded, chunked reading of a large file from an SSD faster than single-threaded, sequential reading?

Question

There seems to be a consensus on StackOverflow that if one reads a large file in full, then sequential reading is fastest and multi-threading of the reading is not likely to give a benefit (e.g., 1, 2, and several more).

Now, in the code example below, multi-threaded reading is actually faster, and by a lot (I have seen 2x, even up to 3x with 1000GB files). Why is that?

sequential: 41s
parallel: 27s

I am reading from a Samsung SSD 990 PRO 4TB on a Xeon w9-3495X 56-core system. When reading sequentially, the SSD active time is around 75%, so it is somewhat understandable that I can achieve higher rates from multi-threading. But why is the SSD active time not at 100% to begin with?

I noticed that the CPU load of the process is 2% with 1 thread and 7% with 4 threads, which is both close to 100% / 56 * nThreads, and maybe that is already the answer. Still, what keeps the CPU so busy during std::filebuf::sgetn? Is there a faster way to read the file that can improve single-threaded read performance as well?

#include 
#include 
#include 
#include 
#include 
#include 
#include 

// fsutil file createnew 100GB 100000000000
constexpr auto filename = "100GB";
constexpr auto bufferSize = 6'000'000;
constexpr auto nThreads = 4;

template
void timeit(const char * message, const Callback & callback) {
    using namespace std::chrono;

    std::cout << message << ": ";
    const auto start = high_resolution_clock::now();

    callback();

    std::cout << duration_cast(high_resolution_clock::now() - start) << std::endl;
}

static void readFile(const size_t nThreads = 1, const size_t iThread = 0) {
    std::filebuf file;
    file.open(filename, std::ios::in | std::ios::binary);

    const auto buffer = std::make_unique_for_overwrite(bufferSize);

    if (iThread > 0) {
        file.pubseekoff(iThread * bufferSize, std::ios_base::cur);
    }
    while (file.sgetn(buffer.get(), bufferSize)) {
        if (nThreads > 1) {
            file.pubseekoff((nThreads - 1) * bufferSize, std::ios_base::cur);
        }
    }
}

int main() {
    timeit("sequential", [] { readFile(); });

    timeit("parallel", [] {
        std::vector threads;
        for (int iThread = 0; iThread < nThreads; iThread++) {
            threads.emplace_back(readFile, nThreads, iThread);
        }
    });
}

Why is multi-threaded, chunked reading of a large file from an SSD faster than single-threaded, sequential reading?

Answers (1)

Note on HDDs (spinning rust) vs. SSDs

Related Questions