Jaehong Lee
Jaehong Lee

Reputation: 108

How to maximize SSD I/O in C++?

I made a program to read and write 2d-array into NVME SSD(Samsung 970EVO plus).

I designed the program to read N*M like as

#pragma omp parallel for
for(int i=0;i<N;i++)
  fstream.read(...) // read M bytes

but, this code shows lower performance(KB/s) than SSD specification(< GB/s)

I think if size M is larger than block-size(maybe 4KB) and multiple of 2, that code will show GB/s performance.

However, it isn't. I think I missed something.

Are there some c++ codes for maximizing I/O performance on SSD?

Upvotes: 5

Views: 964

Answers (2)

Sam Varshavchik
Sam Varshavchik

Reputation: 118435

No matter how much you tell fstream to read, it is likely to get done out of a fixed size streambuf buffer. The C++ standard does not specify its default size, but 4kb is fairly common. So passing a 4mb size to read() will very likely end up effectively reducing this to 1024 calls to read 4kb of data. This likely explains your observed performance. You're not reading a large chunk of data at once, but your application makes many calls to read smaller chunks of data.

The C++ standard does provide the means for resizing the size of the internal stream buffer, via the pubsetbuf method, and leaves it to each C++ implementation to specify exactly when and how to configure a stream buffer with a non-default size. Your C++ implementation may allow you to resize the stream buffer only before opening your std::ifstream, or it may not allow you to resize a std::ifstream's default stream buffer size at all; instead you must construct your custom stream buffer instance first, and then use rdbuf() to attach it to the std::ifstream. Consult your C++ library's documentation for more information.

Or, you may wish to consider using your operating system's native file input/output system calls, and bypass the stream buffer library altogether, which does add some overhead, too. It's likely that the contents of the file first get read into the stream buffer, then copied into your buffer you're passing here. Calling your native file input system calls will eliminate this redundant copy, squeeze a little bit more performance.

Upvotes: 6

Adrian Mole
Adrian Mole

Reputation: 51905

You are probably asking for trouble in trying to parallelize read() calls on an istream object (which is, essentially, a serial mechanism).

From cppreference for istream::read (bolding mine):

Modifies the elements in the array pointed to by s and the stream object. Concurrent access to the same stream object may cause data races, except for the standard stream object cin when this is synchronized with stdio (in this case, no data races are initiated, although no guarantees are given on the order in which extracted characters are attributed to threads).

Upvotes: 3

Related Questions