Reputation: 4868
I'm trying to build a search feature for a project which narrows down items based on a user search input and if it matches the keywords listed against items. For this, I'm saving the item keywords in a data
attribute and matching the query with these keywords using a RegExp pattern.
I'm currently using this expression, which I know is not correct and need your help on that:
new RegExp('\\b(' + query + ')', 'gi')))
where query is |
separated values of the query entered by the user (e.g. \\b(meat|pasta|dinner)
). This returns me a match even if there is only 1 match, say for example - meat
Just to throw some context, here's a small example:
If a user types: meat pasta dinner
it should list all items which have ALL the 3 keywords listed against them i.e. meat
pasta
and dinner
. These are independent of the order they're typed in.
Can you help me with an expression which will match ALL words in a query, in any order?
Upvotes: 44
Views: 53265
Reputation: 6570
stema's answer is technically correct, but it doesn't take performance into account at all. Look aheads are extremely slow (in the context of regular expressions, which are lightning fast). Even with the current logic, the regular expression is not optimal.
So here are some measurements, calculated on larger strings which contain all three words, running the search 1000 times and using four different approaches:
stema's regular expression
/^(?=.*\bmeat\b)(?=.*\bpasta\b)(?=.*\bdinner\b).+/
result: 605ms
optimized regular expression
/^(?=.*?\bmeat\b)(?=.*?\bpasta\b)(?=.*?\bdinner\b)/
uses lazy matching and doesn't need the end all selector
result: 291ms
permutation regular expression
/(\bmeat\b.*?(\bpasta\b.*?\bdinner\b|\bdinner\b.*?\bpasta\b)|\bpasta\b.*?(\bmeat\b.*?\bdinner\b|\bdinner\b.*?\bmeat\b)|\bdinner\b.*?(\bpasta\b.*?\bmeat\b|\bmeat\b.*?\bpasta\b))/
result: 56ms
this is fast because the first pattern is matching, if the last pattern matched, it would be even slower than the look ahead one (300 ms)
array of regular expressions
var regs=[/\bmeat\b/,/\bpasta\b/,/\bdinner\b/];
var result = regs.every(reg=>reg.test(text));
result: 26ms
Note that if the strings are crafted to not match, then the results are:
As you can see, in all cases just using a loop is an order of magnitude faster, not to mention easier to read.
The original question was asking for a regular expression, so my answer to that is the permutation regular expression, but I would not use it, as its size would grow exponentially with the number of search words.
Also, in most cases this performance issue is academic, but it is necessary to be highlighted.
Upvotes: 10
Reputation: 3686
Based on the accepted answer I wrote a simple Java method that builds the regex from an array of keywords
public static String regexIfAllKeywordsExists(String[] keywords) {
StringBuilder sb = new StringBuilder("^");
for (String keyword : keywords) {
sb.append("(?=.*\\b");
sb.append(keyword);
sb.append("\\b)");
}
sb.append(".+");
return sb.toString();
}
Upvotes: 2
Reputation: 92986
You can achieve this will lookahead assertions
^(?=.*\bmeat\b)(?=.*\bpasta\b)(?=.*\bdinner\b).+
See it here on Regexr
(?=.*\bmeat\b)
is a positive lookahead assertion, that ensures that \bmeat\b
is somewhere in the string. Same for the other keywords and the .+
is then actually matching the whole string, but only if the assertions are true.
But it will match also on "dinner meat Foobar pasta"
Upvotes: 69
Reputation: 3669
your regex looks pretty good:
\b(meat|pasta|dinner)\b
Check that the length of matches equals the number of keywords (in this case, three):
string.match(re).length === numberOfKeywords
where re
is the regex with a g
flag, string
is the data and numberOfKeywords
is the number of keywords
This assumes that there are no repeated keywords.
Upvotes: 5