Reputation: 4521
I was expecting an empty string, because I'm specifically negating the word 'authentication' which is within my string.
string ='INFO 2013-09-17 12:13:44,487 authentication failed'
pattern = re.compile(r'\w+\s[\d-]+\s[\d:,]+\s(.*(?!authentication\s)failed)')
re.findall(pattern, string)
['authentication failed']
Can someone explain why this is failing?
Upvotes: 1
Views: 36
Reputation: 1124100
Your .*
pattern matches anything before failed
. That anything itself should not be followed by authentication
plus 1 whitespace character. That restriction is met easily; there is no authentication
with whitespace right after 'authentication '
.
Invert the lookahead; use a negative lookbehind ((?<!...)
) instead. Only match failed
if it is not directly preceded by authentication
:
pattern = re.compile(r'\w+\s[\d-]+\s[\d:,]+\s(.*(?<!authentication\s)failed)')
Now the text doesn't match; the .*
can't match anything as there is no valid failed
text following it that is not also preceded by authentication
.
I've put a demo at https://regex101.com/r/yGW7rH/1; note that the second line with the text matching failed
results in a match, while authentication failed
does not.
Upvotes: 1