Reputation: 10156
My current parser is given below - Reading in ~10MB CSV to an STL vector takes ~30secs, which is too slow for my liking given I've got over 100MB which needs to be read in every time the program is run. Can anyone give some advice on how to improve performance? Indeed, would it be faster in plain C?
int main() {
std::vector<double> data;
std::ifstream infile( "data.csv" );
infile >> data;
std::cin.get();
return 0;
}
std::istream& operator >> (std::istream& ins, std::vector<double>& data)
{
data.clear();
// Reserve data vector
std::string line, field;
std::getline(ins, line);
std::stringstream ssl(line), ssf;
std::size_t rows = 1, cols = 0;
while (std::getline(ssl, field, ',')) cols++;
while (std::getline(ins, line)) rows++;
std::cout << rows << " x " << cols << "\n";
ins.clear(); // clear bad state after eof
ins.seekg(0);
data.reserve(rows*cols);
// Populate data
double f = 0.0;
while (std::getline(ins, line)) {
ssl.str(line);
ssl.clear();
while (std::getline(ssl, field, ',')) {
ssf.str(field);
ssf.clear();
ssf >> f;
data.push_back(f);
}
}
return ins;
}
NB: I have also have openMP at my disposal, and the contents will eventually be used for GPGPU computation with CUDA.
Upvotes: 0
Views: 915
Reputation: 564
apparently, file io is a bad idea, just map the whole file into memory, access the csv file as a continous vm block, this incur only a few syscall
Upvotes: -1
Reputation: 2862
On my machine, your reserve code takes about 1.1 seconds and your populate code takes 8.5 seconds.
Adding std::ios::sync_with_stdio(false); made no difference to my compiler.
The below C code takes 2.3 seconds.
int i = 0;
int j = 0;
while( true ) {
float x;
j = fscanf( file, "%f", & x );
if( j == EOF ) break;
data[i++] = x;
// skip ',' or '\n'
int ch = getc(file);
}
Upvotes: 3
Reputation: 74078
You could half the time by reading the file once and not twice.
While presizing the vector is beneficial, it will never dominate runtime, because I/O will always be slower by some magnitude.
Another possible optimization could be reading without a string stream. Something like (untested)
int c = 0;
while (ins >> f) {
data.push_back(f);
if (++c < cols) {
char comma;
ins >> comma; // skip comma
} else {
c = 0; // end of line, start next line
}
}
If you can omit the ,
and separate the values by white space only, it could be even
while (ins >> f)
data.push_back(f);
or
std::copy(std::istream_iterator<double>(ins), std::istream_iterator<double>(),
std::back_inserter(data));
Upvotes: 5
Reputation: 37970
Try calling
std::ios::sync_with_stdio(false);
at the start of your program. This disables the (allegedly quite slow) synchronization between cin
/cout
and scanf
/printf
(I have never tried this myself, but have often seen the recommendation, such as here). Note that if you do this, you cannot mix C++-style and C-style IO in your program.
(In addition, Olaf Dietsche is completely right about only reading the file once.)
Upvotes: 2