Reputation: 67
I have a log file of variable length which may or may not contain the strings I'm looking for.
Lines have timestamps etc followed by < parameter >#< value > I want to check the parameter and extract the value.
The implementation below works but I'm sure there must be a more efficient way to parse the file.
Key points:
NB. the parse function calls substring then converts that to an int
Any ideas much appreciated
ifstream fileReader(logfile.c_str());
string lineIn;
if(fileReader.is_open())
{
while(fileReader.good())
{
getline(fileReader,lineIn);
if(lineIn.find("value1#") != string::npos)
{
parseValue1(lineIn);
}
else if(lineIn.find("value2#") != string::npos)
{
parseValue2(lineIn);
}
else if(lineIn.find("value3#") != string::npos)
{
parseValue3(lineIn);
}
}
}
fileReader.close();
Upvotes: 0
Views: 2303
Reputation: 57749
Your execution bottleneck will be in file I/O.
I suggest that you haul in as much data as possible in one fetch into a buffer. Next, search the buffer for your tokens.
You have to read in the text in order to search it, so you might as well read in as much of the file as you can.
There may be some drawbacks in reading too much data into memory. If the OS can't fit all the data, it may page it out to a harddrive, which makes the technique worthless (unless you want the OS to handle reading the file in chunks).
Once the file is in memory, searching technique may have negligible performance increases.
Upvotes: 0
Reputation: 44268
First of all you are doing loop wrong. your code should be:
while( getline( fileReader,lineIn ) ) {
}
Second, lines:
if( fileReader.is_open() )
and
fileReader.close();
are redundant. As for speed. I would recommend using regular expression:
std::regex reg ( "(value1#)|(value#2)|(value#3)(\\d+)" );
while( getline( fileReader,lineIn ) ) {
std::smatch m;
if( std::regex_search( lineIn.begin(), lineIn.end(), m, reg ) ) {
std::cout << "found: " << m[4] << std::endl;
}
}
Of course you would need to modify regular expression accordingly.
Unfortunately, iostreams are known to be pretty slow. If you would not get enough performance you may consider to replace fstream with FILE * or mmap.
Upvotes: 1
Reputation: 129524
The first step would be to figure out how much of the time is spent in the if(lineIn.find(...)...
and how much is the actual reading of input file.
Time the time your application runs for (you may want to take a selection of log-files, rather than ALL of them). You may want to run this a few times in a row to see that you get the same (approximately) value.
The add:
#if 0
if (lineIn.find(...) ...)
...
#endif
and compare the time it takes. My guess is that it won't actually make that much of a difference. However, if the searching is a major component of the time, you may find that it's beneficial to use a more clever search method. There are some pretty clever methods for searching for strings in a larger string.
I will post back with a couple of benchmarks of "read a file quicker" that I've posted elsewhere. But bear in mind that the hard-disk that you are reading from will be the major amount of time.
References:
getline while reading a file vs reading whole file and then splitting based on newline character
slightly less relevant, but perhaps interesting:
What is the best efficient way to read millions of integers separated by lines from text file in c++
Upvotes: 0
Reputation: 972
Looks like a lot of repeated searches in the same string, which will not be very efficient.
Parse the file/line in a proper way.
There are three libraries in Boost that might be of help.
Parse the line using a regular expression: http://www.boost.org/doc/libs/1_53_0/libs/regex/doc/html/index.html
Use a tokenizer http://www.boost.org/doc/libs/1_53_0/libs/tokenizer/index.html
For full customization you can always use Spirit. http://www.boost.org/doc/libs/1_53_0/libs/spirit/doc/html/index.html
Upvotes: 0