Hua
Hua

Reputation: 184

How to avoid a specific pattern when using regular expression?

I want to match a pattern like '2 years', '4 days' in a text, and meanwhile want to avoid a pattern like '2 years old', i.e., I don't want a 'old' following 'years'. I thought a negative lookahead (?!old) would help. But I don't know how to do it. I tried

r=regex.compile(r'\b(\d+)\s*(years?|months?|days?)\s*(?!old)\b')

but it still match '2 years'.

Upvotes: 1

Views: 73

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

For a full match you can omit the capture groups, and if there should be at least a single whitespace char between the words and the digits you can repeat 1 or more times using \s+

To prevent partial matches, you can use word boundaries \b

\b\d+\s+(?:year|month|day)s?\b(?!\s+old\b)

The pattern matches

  • \b\d+\s+ A word boundary, match 1+ digits and 1+ whitespace chars
  • (?:year|month|day)s?\b Match any of the alternatives and optional s
  • (?!\s+old\b) Negative lookahead, assert not 1+whitespace chars followed by old and a word boundary to the right

See a regex demo

Upvotes: 2

Sundeep
Sundeep

Reputation: 23667

Put \s* inside the lookahead:

r'\b(\d+)\s*(years?|months?|days?)(?!\s*old)\b'

As far as I understand, your regexp matched \s* zero times for the 2 years old case. The assertion fails since 2 years ends at word boundary and the content after it is space followed by old.

Upvotes: 1

Related Questions