Moondra
Moondra

Reputation: 4521

negative look ahead producing unexpected result

I was expecting an empty string, because I'm specifically negating the word 'authentication' which is within my string.

string ='INFO 2013-09-17 12:13:44,487 authentication failed'

pattern = re.compile(r'\w+\s[\d-]+\s[\d:,]+\s(.*(?!authentication\s)failed)')

re.findall(pattern, string)
['authentication failed']

Can someone explain why this is failing?

Upvotes: 1

Views: 36

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124100

Your .* pattern matches anything before failed. That anything itself should not be followed by authentication plus 1 whitespace character. That restriction is met easily; there is no authentication with whitespace right after 'authentication '.

Invert the lookahead; use a negative lookbehind ((?<!...)) instead. Only match failed if it is not directly preceded by authentication:

pattern = re.compile(r'\w+\s[\d-]+\s[\d:,]+\s(.*(?<!authentication\s)failed)')

Now the text doesn't match; the .* can't match anything as there is no valid failed text following it that is not also preceded by authentication.

I've put a demo at https://regex101.com/r/yGW7rH/1; note that the second line with the text matching failed results in a match, while authentication failed does not.

Upvotes: 1

Related Questions