Reputation: 12838
I have the following text:
#{king} for a ##{day}, ##{fool} for a #{lifetime}
And the following (broken) regex:
[^#]#{[a-z]+}
I want to match all #{words} but not the ##{words} (Doubling '#' acts like escaping) .
Today I've noticed that the regex I have is ignoring the first word (refuses to match #{king}, but correctly ignores ##{day} and ##{fool}) .
>>> regex = re.compile("[^#]#{[a-z]+}")
>>> regex.findall(string)
[u' #{lifetime}']
Any suggestions on how to improve the current regex in order to suit my needs ?
I guess the problem is with [^#]
...
Upvotes: 5
Views: 137
Reputation: 18029
You have to use a "negative lookbehind assertion", the correct regex would look like this:
import re
t = "#{king} for a ##{day}, ##{fool} for a #{lifetime}"
re.findall(r'(?<!#)#{([a-z]+)}', t)
returns
['king', 'lifetime']
Explanation:
The (?<!prefix)pattern
expression matches pattern
only if it's not preceeded by prefix
.
Upvotes: 6
Reputation: 189447
Replace it with (?:^|[^#])
. Like you inferred, just [^#]
means one character which is not #, which obviously there isn't at beginning of line.
Upvotes: 1
Reputation: 222198
>>> regex = re.compile("(?:^|[^#])#{[a-z]+}")
>>> regex.findall(string)
['#{king}', ' #{lifetime}']
>>>
Upvotes: 2
Reputation: 9466
Use a look-behind construction:
>>> s='#{king} for a ##{day}, ##{fool} for a #{lifetime}'
>>> r=re.compile(r'(?:^|(?<=[^#]))#{\w+}')
>>> r.findall(s)
['#{king}', '#{lifetime}']
>>>
Upvotes: 2