Reputation: 1201
I am writing on a graph library that should read the most common graph formats. One format contains information like this:
e 4 3
e 2 2
e 6 2
e 3 2
e 1 2
....
and I want to parse these lines. I looked around on stackoverflow and could find a neat solution to do this. I currently use an approach like this (file is an fstream):
string line;
while(getline(file, line)) {
if(!line.length()) continue; //skip empty lines
stringstream parseline = stringstream(line);
char identifier;
parseline >> identifier; //Lese das erste zeichen
if(identifier == 'e') {
int n, m;
parseline >> n;
parseline >> m;
foo(n,m) //Here i handle the input
}
}
It works quite good and as intended, but today when I tested it with huge graph files (50 mb+) I was shocked that this function was by far the worst bottleneck in the whole program:
The stringstream I use to parse the line uses almost 70% of the total runtime and the getline command 25%. The rest of the program uses only 5%.
Is there a fast way to read those big files, possibly avoiding slow stringstreams and the getline function?
Upvotes: 3
Views: 3016
Reputation: 726579
You can skip double-buffering your string, skip parsing the single character, and use strtoll
to parse integers, like this:
string line;
while(getline(file, line)) {
if(!line.length()) continue; //skip empty lines
if (line[0] == 'e') {
char *ptr;
int n = strtoll(line.c_str()+2, &ptr, 10);
int m = strtoll(ptr+1, &ptr, 10);
foo(n,m) //Here i handle the input
}
}
In C++, strtoll
should be in the <cstdlib>
include file.
Upvotes: 3
Reputation: 41180
mmap the file and process it as a single big buffer.
If you system lacks mmap, you might try to read
the file into a buffer that you malloc
Rationale: most of the time is in the transition from user to system and back in the calls to the C library. Reading in the whole file eliminates almost all those calls.
Upvotes: 1