negative look ahead producing unexpected result

Question

I was expecting an empty string, because I'm specifically negating the word 'authentication' which is within my string.

string ='INFO 2013-09-17 12:13:44,487 authentication failed'

pattern = re.compile(r'\w+\s[\d-]+\s[\d:,]+\s(.*(?!authentication\s)failed)')

re.findall(pattern, string)
['authentication failed']

Can someone explain why this is failing?

Martijn Pieters · Accepted Answer

Your .* pattern matches anything before failed. That anything itself should not be followed by authentication plus 1 whitespace character. That restriction is met easily; there is no authentication with whitespace right after 'authentication '.

Invert the lookahead; use a negative lookbehind ((?) instead. Only match failed if it is not directly preceded by authentication:



pattern = re.compile(r'\w+\s[\d-]+\s[\d:,]+\s(.*(?


Now the text doesn't match; the .* can't match anything as there is no valid failed text following it that is not also preceded by authentication.

I've put a demo at https://regex101.com/r/yGW7rH/1; note that the second line with the text matching failed results in a match, while authentication failed does not.

negative look ahead producing unexpected result

Answers (1)

Related Questions