Ismael Miguel
Ismael Miguel

Reputation: 4241

Regex to match up to 2 full words and the next word containing the character

I've developed the following regular expression to use in a search field.
The goal is to use it to match up to 2 words, then the full word with the character(s) and everything after:

/^
    .*?                 # match anything before, as few times as possible
    (
        (?: 
            [^\s]+\s*   # anything followed by whitespace
        ){1,2}          # match once or twice
        \s*?            # match whitespaces that may be left behind, just in case
        [^\s]*?         # match the beginning of the word, if exists
    )?  
    (foo|bar)           # search term(s)
    ([^\s]*\s*.*)       # whatever is after, with whitespace, if it is the end of the word
$/xi

The problem is that it isn't always matching correctly.
A few examples, when searching for "a":

Fantastic drinks and amazing cakes

Expected match:
$1 = F
$2 = a
$3 = ntastic drinks and amazing cakes

Result:
$1 = Fantastic drinks (space)
$2 = a
$3 = nd amazing cakes

-----------------------------------------

Drinks and party!

Expected match:
$1 = Drinks (space)
$2 = a
$3 = nd party!

Result:
$1 = Drinks and p
$2 = a
$3 = rty!

------------------------------------------

Drinks will be served at the caffetary in 5 minutes

Expected match:
$1 = be served (space)
$2 = a
$3 = t the caffetary in 5 minutes

Result (matches correctly):
$1 = be served (space)
$2 = a
$3 = t the caffetary in 5 minutes

You can experiment with it on https://regex101.com/r/cI7gZ3/1 with unit tests included.

The way that this doesn't work is strange, beyound what I can describe. But, my guess, is that this is prefering matches that have 1-2 words before the search term.

What do you think that may be wrong here? What do you think that is causing these issues?

Upvotes: 2

Views: 655

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

I suggest using lazy versions of \S+ and {1,2} in

(?: 
    \S+?\s* # anything followed by whitespace
){1,2}?

and remove the [^\s]*? # match the beginning of the word, if exists part.

See the updated regex demo

^
  .*? # match anything before, as few times as possible
  (
    (?: 
      \S*?\s* # anything followed by whitespace
    ){1,2}?
    \s* # just in case there's whitespace
  )?
  (a) # search term(s)
  (\S*\s*.*) # whatever is after, without whitespace if it is the end of the word
$

Upvotes: 1

Related Questions