Life after Guest
Life after Guest

Reputation: 327

What's the best approach to find words from a set of words in a string?

I must detect the presence of some words (even polyrematic, like in "bag of words") in a user-submitted string.

I need to find the exact word, not part of it, so the strstr/strpos/stripos family is not an option for me.

My current approach (PHP/PCRE regex) is the following:

\b(first word|second word|many other words)\b

Is there any other better approach? Am I missing something important?

Words are about 1500.

Any help is appreciated

Upvotes: 1

Views: 101

Answers (1)

Robert P
Robert P

Reputation: 15968

A regular expression the way you're demonstrating will work. It may be challenging to maintain if the list of words grows long or changes.

The method you're using will work in the event that you need to look for phrases with spaces and the list doesn't grow much.

If there are no spaces in the words you're looking for, you could split the input string on space characters (\s+, see https://www.php.net/manual/en/function.preg-split.php ), then check to see if any of those words are in a Set (https://www.php.net/manual/en/class.ds-set.php) made up of the words you're looking for. This will be a bit more code, but less regex maintenance, so ymmv based on your application.

If the set has spaces, consider instead using Trie. Wiktor Stribiżew suggests: https://github.com/sters/php-regexp-trie

Upvotes: 1

Related Questions