Passer By
Passer By

Reputation: 21131

Comparison of C and C++ file read performance

I recently have need of reading a non-trivially sized file line by line, and to push performance, I decided to follow some advice I've gotten which states that fstreams are much slower than C style I/O. However despite my best efforts, I have not been able to reproduce the same dramatic differences ( ~25% which is large but not insane ). I also tried out fscanf and found out it is slower by a magnitude.

My question is what is causing the performance difference under the covers and why is fscanf abyssmal?

The following is my code (compiled with TDM GCC 5.1.0):

struct file
{
    file(const char* str, const char* mode)
        : fp(fopen(str, mode)){}
    ~file(){fclose(fp);}
    FILE* fp;
};

constexpr size_t bufsize = 256;
auto readWord(int pos, char*& word, char* const buf)
{
    for(; buf[pos] != '\n'; ++word, ++pos)
    {
        if(pos == bufsize)
            return 0;
        *word = buf[pos];
    }
    *word = '\0';
    return pos + 1;
}

void readFileC()
{
    file in{"inC.txt", "r"};
    char buf[bufsize];
    char word[40];

    char* pw = word;
    int sz = fread(buf, 1, bufsize, in.fp);
    for(; sz == bufsize; sz = fread(buf, 1, bufsize, in.fp))
    {
        for(auto nextPos = readWord(0, pw, buf); (nextPos = readWord(nextPos, pw, buf));)
        {
            //use word here
            pw = word;
        }
    }

    for(auto nextPos = readWord(0, pw, buf); nextPos < sz; nextPos = readWord(nextPos, pw, buf))
    {
        //use word here
        pw = word;
    }
}

void readFileCline()
{
    file in{"inCline.txt", "r"};
    char word[40];
    while(fscanf(in.fp, "%s", word) != EOF);
        //use word here
}

void readFileCpp()
{
    ifstream in{"inCpp.txt"};
    string word;
    while(getline(in, word));
        //use word here
}

int main()
{
    static constexpr int runs = 1;

    auto countC = 0;
    for(int i = 0; i < runs; ++i)
    {
        auto start = steady_clock::now();
        readFileC();
        auto dur = steady_clock::now() - start;
        countC += duration_cast<milliseconds>(dur).count();
    }
    cout << "countC: " << countC << endl;

    auto countCline = 0;
    for(int i = 0; i < runs; ++i)
    {
        auto start = steady_clock::now();
        readFileCline();
        auto dur = steady_clock::now() - start;
        countCline += duration_cast<milliseconds>(dur).count();
    }
    cout << "countCline: " << countCline << endl;

    auto countCpp = 0;
    for(int i = 0; i < runs; ++i)
    {
        auto start = steady_clock::now();
        readFileCpp();
        auto dur = steady_clock::now() - start;
        countCpp += duration_cast<milliseconds>(dur).count();
    }

    cout << "countCpp: " << countCpp << endl;
}

Ran with a file of size 1070KB these are the results :

countC: 7
countCline: 61
countCpp: 9

EDIT: three test cases now read different files and run for once. The results is exactly 1/20 of reading the same file 20 times. countC is consistently outperforming countCpp even when I flipped the order at which they are performed

Upvotes: 1

Views: 636

Answers (1)

abelenky
abelenky

Reputation: 64682

fscanf has to parse the format string parameter, looking for all possible % signs, and interpreting them, along with width-specifers, escape characters, expressions, etc. It has to walk the format parameter more-or-less one character at a time, working through a very big set of potential formats. Even if your format is as simple as "%s", it still is a lot of overhead involved relative to the other techniques which simply grab a bunch of bytes with almost no overhead of interpretation / conversion, etc.

Upvotes: 2

Related Questions