Adam Eckstein
Adam Eckstein

Reputation: 11

Regex Capture group causing issue with optional pattern in regex with negative look-ahead

Using capture groups to get specific pieces of a string. It has worked before, but now that I have optional character groups using ? optional param I am getting weird results.

I am attempting to capture Critical care medicine as a capture group within a string, allow CRIT abbreviation and Medicine optional. Exclude capture group if followed by an "and".

https://regex101.com/r/MeWB7J/1

REGEX: .*((?:(?:\bCRIT(?:ICAL)?\W*CARE\W*(?:MEDICINE)?))(?!\sAND)).*

If I pass it CRITICAL CARE MEDICINE, CRITICAL CARE, or CRIT CARE works fine and I get back expected results in my capture group. However if I pass "CRITICAL CARE MEDICINE AND", my capture group will be "CRITICAL CARE". If I pass "CRIT CARE AND", I get "CRIT CARE". I'm lost on why the negative lookahead isn't working and is being treated as essentially an ignore that part of the pattern.

Upvotes: 1

Views: 23

Answers (1)

The fourth bird
The fourth bird

Reputation: 163632

You can optionally capture MEDICINE in the capture group, if after matching care there is no MEDICINE followed by AND

Note that \W and \s can also match a newline.

.*\b(CRIT(?:ICAL)?\W*CARE\b(?!(?:\s+MEDICINE\b)?\s+AND\b)(?:\s+MEDICINE\b)?).*

The pattern matches:

  • .* Match the whole line
  • \b A word boundary to prevent a partial word match
  • ( Capture group 1
    • CRIT(?:ICAL)? Match CRIT or CRITICAL
    • \W*CARE\b Match optional non word chars and then match the word CARE
    • (?! Negative lookhead, assert what is directly to the righ tis not
      • (?:\s+MEDICINE\b)? Optionally match MEDICINE
      • \s+AND\b Match AND
    • )` Close the lookahead
    • (?:\s+MEDICINE\b)? Optionally match MEDICINE
  • ) Close group 1
  • .* Match the rest of the line

See a regex demo.

Upvotes: 1

Related Questions