ofey
ofey

Reputation: 443

Fast string search?

I have a vector of strings and have to check if each element in vector is present in a given list of 5000 words. Besides the mundane method of two nested loops, is there any faster way to do this in C++?

Upvotes: 9

Views: 4652

Answers (4)

Pete Becker
Pete Becker

Reputation: 76523

So you have a vector of strings, with each string having one or more words, and you have a vector that's a dictionary, and you're supposed to determine which words in the vector of strings are also in the dictionary? The vector of strings is an annoyance, since you need to look at each word. I'd start by creating a new vector, splitting each string into words, and pushing each word into the new vector. Then sort the new vector and run it through the std::unique algorithm to eliminate duplicates. Then sort the dictionary. Then run both ranges through std::set_intersection to write the result.

Upvotes: 2

hege
hege

Reputation: 1017

You could sort the vector, then you can solve this with one "loop" (taken that your dictionary is sorted too) which means O(n) not counting in the cost of the sort.

Upvotes: 2

Philipp
Philipp

Reputation: 69773

You should put the list of strings into an std::set. It's a data structure optimized for searching. Finding if a given element is in the set or not is an operation which is much faster than iterating all entries.

When you are already using C++11, you can also use the std::unordered_set which is even faster for lookup, because it's implemented as a hash table.

Should this be for school/university: Be prepared to explain how these data structures manage to be faster. When your instructor asks you to explain why you used them, "some guys on the internet told me" is unlikely to earn you a sticker in the class book.

Upvotes: 11

Baptiste Wicht
Baptiste Wicht

Reputation: 7683

You could put the list of words in an std::unordered_set. Then, for each element in the vector, you just have to test if it is in the unordered_set in O(1). You would have an expected complexity of O(n) (look at the comment to see why it is only expected).

Upvotes: 3

Related Questions