Reputation: 184
I want to match a pattern like '2 years', '4 days' in a text, and meanwhile want to avoid a pattern like '2 years old', i.e., I don't want a 'old' following 'years'. I thought a negative lookahead (?!old) would help. But I don't know how to do it. I tried
r=regex.compile(r'\b(\d+)\s*(years?|months?|days?)\s*(?!old)\b')
but it still match '2 years'.
Upvotes: 1
Views: 73
Reputation: 163207
For a full match you can omit the capture groups, and if there should be at least a single whitespace char between the words and the digits you can repeat 1 or more times using \s+
To prevent partial matches, you can use word boundaries \b
\b\d+\s+(?:year|month|day)s?\b(?!\s+old\b)
The pattern matches
\b\d+\s+
A word boundary, match 1+ digits and 1+ whitespace chars(?:year|month|day)s?\b
Match any of the alternatives and optional s
(?!\s+old\b)
Negative lookahead, assert not 1+whitespace chars followed by old and a word boundary to the rightSee a regex demo
Upvotes: 2
Reputation: 23667
Put \s*
inside the lookahead:
r'\b(\d+)\s*(years?|months?|days?)(?!\s*old)\b'
As far as I understand, your regexp matched \s*
zero times for the 2 years old
case. The assertion fails since 2 years
ends at word boundary and the content after it is space followed by old
.
Upvotes: 1