Reputation: 3657
I'm trying to read from a file in a faster way. The current way I'm doing it is the following, but it is very slow for large files. I am wondering if there is a faster way to do this? I need the values stored a struct, which I have defined below.
std::vector<matEntry> matEntries;
inputfileA.open(matrixAfilename.c_str());
// Read from file to continue setting up sparse matrix A
while (!inputfileA.eof()) {
// Read row, column, and value into vector
inputfileA >> (int) row; // row
inputfileA >> (int) col; // col
inputfileA >> val; // value
// Add row, column, and value entry to the matrix
matEntries.push_back(matEntry());
matEntries[index].row = row-1;
matEntries[index].col = col-1;
matEntries[index].val = val;
// Increment index
index++;
}
my struct:
struct matEntry {
int row;
int col;
float val;
};
The file is formatted like this (int, int, float):
1 2 7.9
4 5 9.008
6 3 7.89
10 4 10.21
More info:
Upvotes: 2
Views: 1527
Reputation: 6332
As suggested in the comments, you should profile your code before trying to optimize. If you want to try random stuff until the performance is good enough, you can try reading it into memory first. Here's a simple example with some basic profiling written in:
#include <vector>
#include <ctime>
#include <fstream>
#include <sstream>
#include <iostream>
// Assuming something like this...
struct matEntry
{
int row, col;
double val;
};
std::istream& operator << ( std::istream& is, matEntry& e )
{
is >> matEntry.row >> matEntry.col >> matEntry.val;
matEntry.row -= 1;
matEntry.col -= 1;
return is;
}
std::vector<matEntry> ReadMatrices( std::istream& stream )
{
auto matEntries = std::vector<matEntry>();
auto e = matEntry();
// For why this is better than your EOF test, see https://isocpp.org/wiki/faq/input-output#istream-and-while
while( stream >> e ) {
matEntries.push_back( e );
}
return matEntries;
}
int main()
{
const auto time0 = std::clock();
// Read file a piece at a time
std::ifstream inputFileA( "matFileA.txt" );
const auto matA = ReadMatrices( inputFileA );
const auto time1 = std::clock();
// Read file into memory (from http://stackoverflow.com/a/2602258/201787)
std::ifstream inputFileB( "matFileB.txt" );
std::stringstream buffer;
buffer << inputFileB.rdbuf();
const auto matB = ReadMatrices( buffer );
const auto time2 = std::clock();
std::cout << "A: " << ((time1 - time0) * CLOCKS_PER_SEC) << " B: " << ((time2 - time1) * CLOCKS_PER_SEC) << "\n";
std::cout << matA.size() << " " << matB.size();
}
Beware reading the same file on disk twice in a row since the disk caching may hide performance differences.
Other options include:
std::async()
]; medium: pipeline it so the read and convert are done on different threads; hard: process the same file in separate threads)Other higher-level considerations might include:
Upvotes: 2
Reputation: 32544
In my experience, the slowest part in such code is the parsing of numeric values (especially the floating point ones). Therefore your code is most probably CPU-bound and can be sped-up through parallelization as follows:
Assuming that your data is on N lines and you are going to process it using k threads, each thread will have to handle about [N/k] lines.
mmap()
the file.std::istream
that wraps an in-memory buffer).Note that this will require ensuring that the code for populating your data structure is thread safe.
Upvotes: 2
Reputation: 2723
To make things easier, I'd define an input stream operator for your struct.
std::istream& operator>>(std::istream& is, matEntry& e)
{
is >> e.row >> e.col >> e.val;
e.row -= 1;
e.col -= 1;
return is;
}
Regarding speed, there is not much to improve without going to a very basic level of file IO. I think the only thing you could do is to initialize your vector such that it doesn't resize all the time inside the loop. And with the defined input stream operator it looks much cleaner as well:
std::vector<matEntry> matEntries;
matEntries.resize(numberOfLines);
inputfileA.open(matrixAfilename.c_str());
// Read from file to continue setting up sparse matrix A
while(index < numberOfLines && (is >> matEntries[index++]))
{ }
Upvotes: 3