Reputation: 108
I made a program to read and write 2d-array into NVME SSD(Samsung 970EVO plus).
I designed the program to read N*M like as
#pragma omp parallel for
for(int i=0;i<N;i++)
fstream.read(...) // read M bytes
but, this code shows lower performance(KB/s) than SSD specification(< GB/s)
I think if size M is larger than block-size(maybe 4KB) and multiple of 2, that code will show GB/s performance.
However, it isn't. I think I missed something.
Are there some c++ codes for maximizing I/O performance on SSD?
Upvotes: 5
Views: 964
Reputation: 118435
No matter how much you tell fstream
to read, it is likely to get done out of a fixed size streambuf buffer. The C++ standard does not specify its default size, but 4kb is fairly common. So passing a 4mb size to read()
will very likely end up effectively reducing this to 1024 calls to read 4kb of data. This likely explains your observed performance. You're not reading a large chunk of data at once, but your application makes many calls to read smaller chunks of data.
The C++ standard does provide the means for resizing the size of the internal stream buffer, via the pubsetbuf
method, and leaves it to each C++ implementation to specify exactly when and how to configure a stream buffer with a non-default size. Your C++ implementation may allow you to resize the stream buffer only before opening your std::ifstream
, or it may not allow you to resize a std::ifstream
's default stream buffer size at all; instead you must construct your custom stream buffer instance first, and then use rdbuf()
to attach it to the std::ifstream
. Consult your C++ library's documentation for more information.
Or, you may wish to consider using your operating system's native file input/output system calls, and bypass the stream buffer library altogether, which does add some overhead, too. It's likely that the contents of the file first get read into the stream buffer, then copied into your buffer you're passing here. Calling your native file input system calls will eliminate this redundant copy, squeeze a little bit more performance.
Upvotes: 6
Reputation: 51905
You are probably asking for trouble in trying to parallelize read()
calls on an istream
object (which is, essentially, a serial mechanism).
From cppreference for istream::read
(bolding mine):
Modifies the elements in the array pointed to by s and the stream object. Concurrent access to the same stream object may cause data races, except for the standard stream object cin when this is synchronized with stdio (in this case, no data races are initiated, although no guarantees are given on the order in which extracted characters are attributed to threads).
Upvotes: 3