Reputation: 11

Check if a word from one text file is in a second text file (C++)

I've got two text files: The first has ~100,000 words and the other has ~850,000 words. Both have been parsed into separate Vectors. If a word is in both files, I need to do something.

I've written some C++ code that loops through the first and the second file, but the time complexity is O(n^2) which with files this big is taking forever to run through. Even after 15 minutes it doesn't seem close to being finished.

for (string word1 : firstTextFile)
            {
                for (string word2 : secondTextFile)
                {
                    if (word1 == word2)
                    {
                        doSomething();
                    }
                }
            }

Is there a faster way to do this? I've searched everywhere but I've got no idea what to do. Any help would be great, thanks!

Upvotes: 1

Answers (3)

3CxEZiVlQ

Reputation: 38335

#include <algorithm>

for (string word1 : firstTextFile) {
  if (std::binary_search(secondTextFile.begin(), secondTextFile.end(), word1) {
    doSomething();
  }
}

Complexity above is O(firstTextFile.size() * log(secondTextFile.size()).

If you would use std::unoredered_set<std::string> secondTextFile instead of std::vector<std::string> secondTextFile:

for (string word1 : firstTextFile) {
  if (secondTextFile.count(word1)) {
    doSomething();
  }
}

Complexity is O(firstTextFile.size()).

Additionally you would save time on inserting and sorting words into secondTextFile: O(secondTextFile.size()) instead of O(secondTextFile.size() * log(secondTextFile.size())).

Upvotes: 2

mksteve

Reputation: 13073

As both vectors have been sorted, then the algorithm to achieve this is akin to a merge sort.

There is a linear walk through the lists, with the algorithm trying to keep both lists at about the same part of the dictionary ordering.

while( worda && wordb ){
    if( worda == wordb ){
       DoSomething();
       worda = nextWordFromA();
       wordb = nextWordFromB();
    } else if ( worda < wordb ) {
       worda = nextWordFromA();
    } else {
       wordb = nextWordFromB();
    }
}

Upvotes: 1

Ethan McCue

Reputation: 988

Short answer: yes.

The std::set_intersection function handles this case in linear time. If you are able to, simply use that.

(reference)

Upvotes: 4

Check if a word from one text file is in a second text file (C++)

Answers (3)

Related Questions