frist
frist

Reputation: 1958

Performance difference of getline operating with istream and FILE*

I'm trying to compare performance of reading a file line by line. The first case is getline for string and istream, the second one if getline for char* and FILE*. I'm wondering:

  1. Why is the first case is so much slow?
  2. Is it possible to make C++ snippet faster?

Consider the output below (ifstream first):

Lines count: 10628126
ifstream getline: 43.2684
Lines count: 10628126
fopen getline: 1.06217

FILE* first:

Lines count: 10628126
fopen getline: 1.96065
Lines count: 10628126
ifstream getline: 43.0428

The code I used for testing:

#include <fstream>
#include <iostream>
#include <string>

#include <sys/time.h>
#include <stdio.h>


using namespace std;

double gettime()
{
    double result = 0;
    struct timeval tv = {0};
    struct timezone tz = {0};
    gettimeofday(&tv, &tz);
    result = tv.tv_sec + (1.0 * tv.tv_usec / 1000000);
    return result;
}

void read_cpp(const char * filename)
{
    ifstream ifile(filename);
    string line;
    unsigned int i = 0;
    while(getline(ifile, line)) i++;
    cout << "Lines count: " << i << endl;
}

void read_c(const char * filename)
{
    FILE * ifile = fopen(filename, "r");
    size_t linesz = 4096+1;
    char * line = new char[linesz];
    unsigned int i = 0;
    while(getline(&line, &linesz, ifile) > 0) i++;
    delete[] line;
    cout << "Lines count: " << i << endl;
    fclose(ifile);
}

int main(int argc, char * argv[])
{
    double tmstart;
    tmstart = gettime();
    read_cpp(argv[1]);
    cout << "ifstream getline: " << (gettime() - tmstart) << endl;
    tmstart = gettime();
    read_c(argv[1]);
    cout << "fopen getline: " << (gettime() - tmstart) << endl;
}

P.S. I tried to swap read_cpp and read_c with almost no difference.

UPDATE

It looks like @Galik and @geza were unable to reproduce the issue using g++ compiler, so I checked the code on linux environment and there is almost no difference between C and C++ implementations. So it seems to be an environment problem. Originally I measured the time using Mac OS X and the default C++ compiler which is clang (surprised for me):

$ g++ -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

But all those things never happened with real g++:

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ...
Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10)

Sorry guys for inconvenience.

UPDATE2

I've found the related topic clang++ fstreams 10X slower than g++. The author also faced with the performance drop back for code compiled by clang. To resolve this issue one can use different stdlib implementation (-stdlib=stdlibc++) instead of the default one (-stdlib=libc++). In this case clang will show the deprecation warning:

clang: warning: libstdc++ is deprecated; move to libc++ [-Wdeprecated]

but performance will be much better (even without optimization):

Lines count: 10628126
fopen getline: 1.02899
Lines count: 10628126
ifstream getline: 1.67594

Upvotes: 5

Views: 8908

Answers (1)

Richard Hodges
Richard Hodges

Reputation: 69912

The c++ version does a lot more bounds checking, locale-interpretation and iostream state management. It's extremely robust.

The c version is minimalist and much more brittle.

There is a price for safety and utility.

That price is time.

update:

The c readline expects to use malloc and free, not new and delete.

Here is the corrected version:

#include <cstdlib>
#include <cstdio>
#include <iostream>

void read_c(const char * filename)
{
    FILE * ifile = fopen(filename, "r");
    size_t linesz = 0;
    char * line = nullptr;
    unsigned int i = 0;
    while(getline(&line, &linesz, ifile) > 0) i++;
    free(line);
    std::cout << "Lines count: " << i << std::endl;
    fclose(ifile);
}

Upvotes: 2

Related Questions