Alexander
Alexander

Reputation: 2463

Using regex to search for keywords at the beginning of words only

I have a searching system that splits the keyword into chunks and searches for it in a string like this:

var regexp_school = new RegExp("(?=.*" + split_keywords[0] + ")(?=.*" + split_keywords[1] + ")(?=.*" + split_keywords[2] + ").*", "i");

I would like to modify this so that so that I would only search for it in the beginning of the words.

For example if the string is:

"Bbe be eb ebb beb"

And the keyword is: "be eb"

Then I want only these to hit "be ebb eb"

In other words I want to combine the above regexp with this one:

var regexp_school = new RegExp("^" + split_keywords[0], "i");

But I'm not sure how the syntax would look like.

I'm also using the split function to split the keywords, but I don't want to set a length since I don't know how many words there are in the keyword string.

split_keywords = school_keyword.split(" ", 3);

If I leave the 3 out, will it have dynamic length or just length of 1? I tried doing a

 alert(split_keywords.lenght);

But didn't get a desired response

Upvotes: 2

Views: 725

Answers (2)

ridgerunner
ridgerunner

Reputation: 34395

A couple points. First, you need to anchor the regex to the start of the string. Otherwise, if there is no match, there are a LOT of combinations that the regex engine must try before declaring a match failure (it must check all of them, in fact). Second, when splitting the string, use /\s+/ instead of a single space - this prevents getting empty matches in the resulting array in case there are multiple spaces between any keywords. Third, if there are empty strings in the array of keywords, you do not want to add them to the regex. Felix's solution is pretty close to the mark, but does not actually match the string once all the positive lookahead assertions are finished. That said, here's my proposed solution:

var split_keywords = school_keyword.split(/\s+/);
var regex = "^"; // Anchor to start of string.
for (var i = 0, len = split_keywords.length; i < len; ++i) {
    if (split_keywords[i]) { // Skip empty keyword strings.
        regex += "(?=.*?\\b" + split_keywords[i] + ")";
    }
}
regex += ".*$"; // Add ending to actually match the line.
var regexp_school = new RegExp(regex, "i");

I've also changed the greedy quantifier to lazy. This is one case where it is applicable.

Upvotes: 1

Felix Kling
Felix Kling

Reputation: 816442

You should use the special word boundary character \b to match the beginning of a word. To create the expression for an arbitrary number of keywords, you can generate it in a loop.

var regex = '';

for(var i = split_keywords.length;i--; ) {
    // two slashes are needed to insert `\` literally
    regex += "(?=.*\\b" + split_keywords[i] + ")";
}

var regexp_school = new RegExp(regex, "i");

I'm not sure about performance, but you can also consider to use indexOf to test whether a substring is contained in a string.

Update:

If \b does not work for you (because of other "special" characters), and all your words are separated by a white space, you can use

"(?=.*\\s" + split_keywords[i] + ")"

or

"(?=.* " + split_keywords[i] + ")"

But for this to work you have to prepend the text you are searching in with a white space:

" " + textYouSearchIn

or you are write a more complex expression:

"(?=(^|.*\\s)" + split_keywords[i] + ")"

Upvotes: 2

Related Questions