Reputation: 3167
I'm searching a file line by line for the occurrence of ##random_string##. It works except for the case of multiple #...
pattern='##(.*?)##'
prog=re.compile(pattern)
string='lala ###hey## there'
result=prog.search(string)
print re.sub(result.group(1), 'FOUND', string)
Desired Output:
"lala #FOUND there"
Instead I get the following because its grabbing the whole ###hey##:
"lala FOUND there"
So how would I ignore any number of # at the beginning or end, and only capture "##string##".
Upvotes: 0
Views: 323
Reputation: 96081
>>> import re
>>> text= 'lala ###hey## there'
>>> matcher= re.compile(r"##[^#]+##")
>>> print matcher.sub("FOUND", text)
lala #FOUND there
>>>
Upvotes: 0
Reputation: 42845
Your problem is with your inner match. You use .
, which matches any character that isn't a line end, and that means it matches #
as well. So when it gets ###hey##
, it matches (.*?)
to #hey
.
The easy solution is to exclude the #
character from the matchable set:
prog = re.compile(r'##([^#]*)##')
Protip: Use raw strings (e.g. r''
) for regular expressions so you don't have to go crazy with backslash escapes.
Trying to allow #
inside the hashes will make things much more complicated.
EDIT: If you do not want to allow blank inner text (i.e. "####" shouldn't match with an inner text of ""), then change it to:
prog = re.compile(r'##([^#]+)##')
+
means "one or more."
Upvotes: 3
Reputation: 343201
have you considered doing it non-regex way?
>>> string='lala ####hey## there'
>>> string.split("####")[1].split("#")[0]
'hey'
Upvotes: 0
Reputation: 21130
'^#{2,}([^#]*)#{2,}'
-- any number of # >= 2 on either end
be careful with using lazy quantifiers like (.*?) because it'd match '##abc#####' and capture 'abc###'. also lazy quantifiers are very slow
Upvotes: 1
Reputation: 5804
Adding + to regex, which means to match one or more character.
pattern='#+(.*?)#+'
prog=re.compile(pattern)
string='###HEY##'
result=prog.search(string)
print result.group(1)
Output:
HEY
Upvotes: 0
Reputation: 186118
To match at least two hashes at either end:
pattern='##+(.*?)##+'
Upvotes: 3