Reputation: 31
I have an information retrieval and storage course project which for the first part I have to find the optimum buffer size for reading big files from the hard disk. our t.a says with increasing the buffer size up to a certain point (usually 4 bytes) the reading speed will increase but after that it decreases. but with my code below, it just increases no matter the buffer size or the file size (I have tested it on 100 mb). from what I know buffering only makes sense in parallel asynchronous processes (like threads) and that expectation for the buffer size-reading speed curve should hold true when the file is defragmented and\or the cost of looking up file directory and addresses(for the disk) is significant enough, so is the problem related to my code or the way ifstream handles things or maybe those conditions just don't hold up here?
ifstream in("D:ISR\\Articles.dat", std::ifstream::binary);
if(in)
{
in.seekg(0, in.end);
int length = in.tellg();
length = 100 * 1024 * 1024;
int bufferSize = 2;
int blockSize = 1024;//1kB
int numberOfBlocks = length / blockSize;
if(length % blockSize > 0) numberOfBlocks++;
clock_t t;
double time;
for(int i = 0; i < 5; i++)
{
in.seekg(0, in.beg);
int position = 0;
int bufferPosition;
char* streamBuffer = new char[bufferSize];
in.rdbuf()->pubsetbuf(streamBuffer, bufferSize);
t = clock();
for(int i = 0; i < numberOfBlocks; i++)
{
char* buffer = new char[blockSize];
bufferPosition = 0;
while(bufferPosition < blockSize && position < length)
{
in.read(buffer + bufferPosition, bufferSize);
position += bufferSize;
bufferPosition += bufferSize;
}
delete[] buffer;
}
t = clock() - t;
time = double(t) / CLOCKS_PER_SEC;
cout << "Buffer size : " << bufferSize << " -> Total time in seconds : " << time << "\n";
bufferSize *= 2;
}
Upvotes: 2
Views: 2149
Reputation: 129
it just increases no matter the buffer size or the file size
The above statement does not hold true. Since you measure your program repeatedly, the successive result will be better than the previous ones due to the benefits of system cache. In fact, you access the file content from system cache instead of hard disk. BUT after the buffer size overs a threshold, the reading performance WILL decrease. Thanks to Richard Steven's chapter 3 in APUE 2nd, you can find the detailed and extensive experiments of reading & writing buffers.
Upvotes: 0
Reputation: 56519
what I know buffering only makes sense in parallel asynchronous processes
No! No! Buffering make sense in many situations. A common situation is I/O. If you increase the size of read/write buffer. Operating system can touch the I/O device less.
And it can read/write larger blocks in each operation. Then, the performance gets better.
Choose buffer size in 2^n
: 128, 512, 1024,... otherwise it can decrease the performance.
Upvotes: 4