Reputation: 11
I've got two text files: The first has ~100,000 words and the other has ~850,000 words. Both have been parsed into separate Vectors. If a word is in both files, I need to do something.
I've written some C++ code that loops through the first and the second file, but the time complexity is O(n^2) which with files this big is taking forever to run through. Even after 15 minutes it doesn't seem close to being finished.
for (string word1 : firstTextFile)
{
for (string word2 : secondTextFile)
{
if (word1 == word2)
{
doSomething();
}
}
}
Is there a faster way to do this? I've searched everywhere but I've got no idea what to do. Any help would be great, thanks!
Upvotes: 1
Views: 368
Reputation: 38335
#include <algorithm>
for (string word1 : firstTextFile) {
if (std::binary_search(secondTextFile.begin(), secondTextFile.end(), word1) {
doSomething();
}
}
Complexity above is O(firstTextFile.size() * log(secondTextFile.size()).
If you would use std::unoredered_set<std::string> secondTextFile
instead of std::vector<std::string> secondTextFile
:
for (string word1 : firstTextFile) {
if (secondTextFile.count(word1)) {
doSomething();
}
}
Complexity is O(firstTextFile.size()).
Additionally you would save time on inserting and sorting words into secondTextFile
: O(secondTextFile.size()) instead of O(secondTextFile.size() * log(secondTextFile.size())).
Upvotes: 2
Reputation: 13073
As both vectors have been sorted, then the algorithm to achieve this is akin to a merge sort.
There is a linear walk through the lists, with the algorithm trying to keep both lists at about the same part of the dictionary ordering.
while( worda && wordb ){
if( worda == wordb ){
DoSomething();
worda = nextWordFromA();
wordb = nextWordFromB();
} else if ( worda < wordb ) {
worda = nextWordFromA();
} else {
wordb = nextWordFromB();
}
}
Upvotes: 1
Reputation: 988
Short answer: yes.
The std::set_intersection
function handles this case in linear time. If you are able to, simply use that.
Upvotes: 4