Andrei Ciobanu
Andrei Ciobanu

Reputation: 12838

Retrieve text inside #{ }

I have the following text:

#{king} for a ##{day}, ##{fool} for a #{lifetime}

And the following (broken) regex:

[^#]#{[a-z]+}

I want to match all #{words} but not the ##{words} (Doubling '#' acts like escaping) .

Today I've noticed that the regex I have is ignoring the first word (refuses to match #{king}, but correctly ignores ##{day} and ##{fool}) .

>>> regex = re.compile("[^#]#{[a-z]+}")
>>> regex.findall(string)
[u' #{lifetime}']

Any suggestions on how to improve the current regex in order to suit my needs ? I guess the problem is with [^#] ...

Upvotes: 5

Views: 137

Answers (5)

user906780
user906780

Reputation:

try this :

re.compile('^#\{[\w]+\}')

Upvotes: 0

mdeous
mdeous

Reputation: 18029

You have to use a "negative lookbehind assertion", the correct regex would look like this:

import re
t = "#{king} for a ##{day}, ##{fool} for a #{lifetime}"
re.findall(r'(?<!#)#{([a-z]+)}', t)

returns

['king', 'lifetime']

Explanation:

The (?<!prefix)pattern expression matches pattern only if it's not preceeded by prefix.

Upvotes: 6

tripleee
tripleee

Reputation: 189447

Replace it with (?:^|[^#]). Like you inferred, just [^#] means one character which is not #, which obviously there isn't at beginning of line.

Upvotes: 1

Dogbert
Dogbert

Reputation: 222198

>>> regex = re.compile("(?:^|[^#])#{[a-z]+}")
>>> regex.findall(string)
['#{king}', ' #{lifetime}']
>>>

Upvotes: 2

Maxim Razin
Maxim Razin

Reputation: 9466

Use a look-behind construction:

>>> s='#{king} for a ##{day}, ##{fool} for a #{lifetime}'
>>> r=re.compile(r'(?:^|(?<=[^#]))#{\w+}')
>>> r.findall(s)
['#{king}', '#{lifetime}']
>>>

Upvotes: 2

Related Questions