Timir
Timir

Reputation: 151

Efficient way of searching string in file in c++ for very large inputs

I have a primary file which has millions of lines. Then while reading each line from the file, I need to find the line in another file that has much fewer lines (several thousand only) to make some decision. Currently I am using vector to read the second file at the beginning and then for each line in the primary file I iterate over the vector to look for the line. The problem is that running time is quite long. Is there any efficient way to perform the task and limit the running time to some reasonable value.

Upvotes: 1

Views: 1839

Answers (3)

Mike Dunlavey
Mike Dunlavey

Reputation: 40669

You have an inner loop, which compares the current line of the primary file to lines in the secondary file. If you take some stack samples, you're probably going to find it somewhere in that inner loop most of the time.

You might consider this technique, where you preprocess your secondary file into a special-purpose procedure that you then compile and link in with your main program. The time it takes will be the time to read the secondary file, and then on the order of a second or two to write the special-purpose procedure, and then to compile and link the whole thing.

Then the running of your main program should be I/O bound reading the primary file, since the inner loop will be much faster.

Upvotes: 0

mvp
mvp

Reputation: 116197

You should read second file into std::map<std::string,int>. Map key would be line, and value is number of times line was encountered in second file.

This way time to check if given line from first file can be found in second is constant, and overall time of your run should be only limited by speed of disk drive to read contents of first huge file.

Upvotes: 1

zabulus
zabulus

Reputation: 2513

You can try to replace second (smaller) vector with a std::set.

Upvotes: 0

Related Questions