Norix
Norix

Reputation: 33

Regex positive lookahead not matching as expected

I have to use a regular expression to match several strings and capture parts of the string.

Example strings could look like:



The goal is to lazy match and capture the middle name of robert palmer up to the point where the surname (palmer) appears in the string AND ensure the rest of the string matches the static text (robert ___ palmer sent for the boat).

I have used a positive lookahead to find the middle name and stop matching if palmer is found:

/robert (.+?)(?=\spalmer) palmer/

which correctly matches;

robert eric palmer

robert eric william palmer

and correctly doesn't match;

robert eric william palmer palmer


The problem:

when I add the rest of the static text to the regex;

/robert (.+?)(?=\spalmer) palmer sent for the boat/

it incorrectly matches;

robert eric william palmer palmer sent for the boat
robert eric palmer palmer sent for the boat

How can I lazy match up to palmer for the middle name and still assert the rest of the static text matches?

I hope this makes sense!

Upvotes: 2

Views: 129

Answers (3)

bobble bubble
bobble bubble

Reputation: 18545

As already mentioned, the lookahead in your sample is unneeded. If you want to lazily match the part until palmer with optional palmer and a specified substring after it, add it to the pattern.

robert (.+?) palmer(?:.* palmer)? sent for the boat

The optional greedy (?:.* palmer)? will consume the gap between lazy part and sent for the boat.

See this demo at regex101   (?:opens a non capturing group)


For just consecutive palmer after, an idea to use robert (.+?) (?:palmer )+sent for the boat

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627264

You may use

robert ((?:(?!palmer).)+?) palmer sent for the boat

See the regex demo.

Details

  • robert - a literal substring
  • ((?:(?!palmer).)+?) - a capturing group #1 with a tempered greedy token that matches any char (.), 1 or more occurrences but as few as possible, that does not start a palmer char sequence
  • palmer sent for the boat - a literal substring.

To unroll the pattern for better performance use

robert ([^p]*(?:p(?!almer)[^p]*)*) palmer sent for the boat

See this regex demo.

Upvotes: 2

David542
David542

Reputation: 110412

What about using a greedy match instead? For example:

robert (.+) palmer

Otherwise it potentially would leave at the first occurrence of palmer instead of the last. Example here.

enter image description here

Upvotes: 0

Related Questions