Reputation: 87
EDIT: In the sample code, I originally stated I was printing to the console. That was just sample code to reference how I was doing my file i/o. I'm actually storing the data in a vector to be used later.
So I'm using the standard file i/o for C++ to read in this CSV file containing about 7 million records. Right now it takes about 80 seconds on an 8gb pc however I'm want to speed up that time.
string line;
ifstream myfile ("example.csv");
if (myfile.is_open())
{
while ( getline (myfile,line) )
{
//cout << line << '\n'; -- edit. Not printing out to console but
storing to an array
}
myfile.close();
}
Since the CSV file only has a single column, is there a way to quickly grab all of the data at once rather than going through row by row?
My understanding is that the transfer from the file to the program is what takes the longest so I was thinking if I could store all of the data from the file somewhere(not sure of this process exactly) and then write it all at once to the c++ program, it should speed up the process.
Upvotes: 0
Views: 1576
Reputation: 41464
getline
is already going to invoke block-based buffered reading on the file stream, and your OS is going to further optimize that access pattern with pre-caching. (Hell, your hard drive is probably going to get all clever about it.) It's not surprising that your program is taking so long, but that's because console output is a lot slower than file input (primarily because of the need to do a bunch of font rendering afterwards). Before you try to optimize your IO, implement the actual processing you want to perform on the file [and take out the console output], and see how fast it is then.
Upvotes: 3
Reputation: 7960
Printing 7 million lines tot he console is very time consuming. Not sure why you want to do that.
You can comment out the line with cout
and see how fast that goes w/o the console print.
Reading large amounts of sequential data is not optimal with buffered I/O since data copied twice (or more):
Disk --> Buffer --> program.
You can use unbuffered I/O via the open/read/close C functions (#include <io.h>
). This is less suitable for text processing.
Another alternative is to increase the buffer size used by the C runtime library via setvbuf
. You can play with different sizes to see if it helps.
Upvotes: 0