Buffered input versus standard input

Question

I was trying to read a long list of numbers (Around 10^7) from input file. Through some searching I found that reading the contents using buffer gives more performance when compared to reading the number one by one.

My second program is performing better than the first program. I am using a cin stream object in the first program and stringstream object in the second program. What is the difference between these two in terms of I/O performance?

#include 
using namespace std;

int main()
{
    int n,k;
    cin >> n >> k;
    int count = 0;
    while ( n-- > 0 )
    {
        int num;
        cin >> num;
        if( num % k == 0 )
            count++;
    }
    cout << count << endl;
    return 0;
}

This program is taking a longer time when compared to the following code using buffered input.

#include 
#include 
using namespace std;

int main()
{
    cin.seekg(0, cin.end);
    int length = cin.tellg();
    cin.seekg(0, cin.beg);
    char *buffer = new char[length];
    cin.read(buffer,length);
    stringstream ss(buffer);
    int n,k;
    ss >> n >> k;
    int result = 0;
    while( n-- )
    {
        int num;
        ss >> num;
        if( num % k == 0 )
            result++;
    }
    cout << result << endl;
    return 0;
}

Thanatos · Accepted Answer

The second one will require ~twice the file's size in memory, otherwise, since it reads the entire file in one call, it will likely read data into memory as fast as the underlying storage can feed it, and then process it as fast as the CPU can do so.

It'd be good to avoid the memory cost, and in that respect, your first program is better. On my system, using an input called test.txt that looks like:

10000000 2
13
13
< 10000000-2 more "13"s. >

and your first program called a, and your second called b. I get:

% time ./a



cin is not buffered by default, to keep "synchronized" with stdio. See this excellent answer for a good explanation. To make it buffered, I added cin.sync_with_stdio(false) to the top of your first program, and called the result c, which runs perhaps slightly faster:

% time ./c 


(Note: the times waffle around a bit, and I only ran a few tests, but c seems to be at least as fast as b.)

Your second program runs quickly because while not buffered, we can just issue one read call. The first program must issue a read call for each cin >>, whereas the third program can buffer (issue a read call every now and then).

Note that adding this line means you can't read from stdin using the C FILE * by that name, or call any library methods that would do so. In practice, this is likely to not be an issue.

Buffered input versus standard input

Answers (1)

Related Questions