BillyJean
BillyJean

Reputation: 1587

loading numbers from a data-file efficiently

I have the following minimal working example, which illustrates how I am currently loading in a set of numbers from a data file "info.txt":

#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

int main() {

double temp_var;
vector<double> container_var;

ifstream test("info.txt");

while(test>>temp_var)
{
    container_var.push_back(temp_var);
}


cout << container_var[0] << endl;
return 0;
}

The file "info.txt" contains integers of the form

1.0
2.1
3.6
...

I am probably going to load in 50.000-100.000 numbers (maybe even more), so I am interested in doing this efficiently. Is there something fundemental that I have missed in my example that may slow down the loading process?

Upvotes: 0

Views: 241

Answers (3)

Mihai Sebea
Mihai Sebea

Reputation: 408

First you need to read the data for which you can ..

a. open the file and read from it

b. alocate memory copy the contents of the file in it

c. memory map the file

Depending on the size of the file i would say that c is the best option because you avoid the cost of alocating and copying the data and it's much faster then naivly reading from the file.

Second you need to parse the contents aparently the best way to do this is a hand rolled loop see http://tinodidriksen.com/2011/05/28/cpp-convert-string-to-double-speed/ for more details .I did try this myself and it's the way to go for large files.

And third ..you need to prealocate the buffer in which you store the result in order minimize alocations .

Of course you need to measure performance ..find hotspot ..eliminate them ... rinse and repeat.

Upvotes: 1

mauve
mauve

Reputation: 2016

When you are going to add a lot of elements to a std::vector the vector will grow while you add elements to it. When the vector is grown all the data usually needs to copied to the new buffer, you can tell the vector to reserve a lot of space before you add a lot of elements to keep the number of growing and copying operations lower:

std::vector<int> v(5000);

The above will create a vector with 5000 elements already in it (default-initialized). You can reserve more space after construction by calling std::vector::reserve():

std::vector<int> v;
v.reserve(10000); // ensure the vector has a capacity of at least 10k elements

While I think that this is the actual problem, the problem could also be in the line cout << container[0] << endl. std::endl flushes the files buffer so it is usually slow. The third reason could be that the std::cout stream is synced with the C-stdio file apis. The synching forces the iostreams library to flush after every character is written. You can disable this synching with:

std::cout.sync_with_stdio(false);

Upvotes: 1

Thomas Matthews
Thomas Matthews

Reputation: 57688

If you know the quantities of numbers ahead of time, you can tell std::vector to preallocate the space. This will make the push_back function more efficient.

Other optimization techniques include memory mapped file, and double buffering.

Upvotes: 1

Related Questions