Reputation: 4241
I've developed the following regular expression to use in a search field.
The goal is to use it to match up to 2 words, then the full word with the character(s) and everything after:
/^
.*? # match anything before, as few times as possible
(
(?:
[^\s]+\s* # anything followed by whitespace
){1,2} # match once or twice
\s*? # match whitespaces that may be left behind, just in case
[^\s]*? # match the beginning of the word, if exists
)?
(foo|bar) # search term(s)
([^\s]*\s*.*) # whatever is after, with whitespace, if it is the end of the word
$/xi
The problem is that it isn't always matching correctly.
A few examples, when searching for "a":
Fantastic drinks and amazing cakes
Expected match:
$1 = F
$2 = a
$3 = ntastic drinks and amazing cakes
Result:
$1 = Fantastic drinks (space)
$2 = a
$3 = nd amazing cakes
-----------------------------------------
Drinks and party!
Expected match:
$1 = Drinks (space)
$2 = a
$3 = nd party!
Result:
$1 = Drinks and p
$2 = a
$3 = rty!
------------------------------------------
Drinks will be served at the caffetary in 5 minutes
Expected match:
$1 = be served (space)
$2 = a
$3 = t the caffetary in 5 minutes
Result (matches correctly):
$1 = be served (space)
$2 = a
$3 = t the caffetary in 5 minutes
You can experiment with it on https://regex101.com/r/cI7gZ3/1 with unit tests included.
The way that this doesn't work is strange, beyound what I can describe. But, my guess, is that this is prefering matches that have 1-2 words before the search term.
What do you think that may be wrong here? What do you think that is causing these issues?
Upvotes: 2
Views: 655
Reputation: 626893
I suggest using lazy versions of \S+
and {1,2}
in
(?:
\S+?\s* # anything followed by whitespace
){1,2}?
and remove the [^\s]*? # match the beginning of the word, if exists
part.
See the updated regex demo
^
.*? # match anything before, as few times as possible
(
(?:
\S*?\s* # anything followed by whitespace
){1,2}?
\s* # just in case there's whitespace
)?
(a) # search term(s)
(\S*\s*.*) # whatever is after, without whitespace if it is the end of the word
$
Upvotes: 1