nubme
nubme

Reputation: 3167

Python Regular Expression Matching: ## ##

I'm searching a file line by line for the occurrence of ##random_string##. It works except for the case of multiple #...

pattern='##(.*?)##'
prog=re.compile(pattern)

string='lala ###hey## there'
result=prog.search(string)

print re.sub(result.group(1), 'FOUND', string)

Desired Output:

"lala #FOUND there"

Instead I get the following because its grabbing the whole ###hey##:

"lala FOUND there"

So how would I ignore any number of # at the beginning or end, and only capture "##string##".

Upvotes: 0

Views: 323

Answers (7)

tzot
tzot

Reputation: 96081

>>> import re
>>> text= 'lala ###hey## there'
>>> matcher= re.compile(r"##[^#]+##")
>>> print matcher.sub("FOUND", text)
lala #FOUND there
>>>

Upvotes: 0

Mike DeSimone
Mike DeSimone

Reputation: 42845

Your problem is with your inner match. You use ., which matches any character that isn't a line end, and that means it matches # as well. So when it gets ###hey##, it matches (.*?) to #hey.

The easy solution is to exclude the # character from the matchable set:

prog = re.compile(r'##([^#]*)##')

Protip: Use raw strings (e.g. r'') for regular expressions so you don't have to go crazy with backslash escapes.

Trying to allow # inside the hashes will make things much more complicated.

EDIT: If you do not want to allow blank inner text (i.e. "####" shouldn't match with an inner text of ""), then change it to:

prog = re.compile(r'##([^#]+)##')

+ means "one or more."

Upvotes: 3

ghostdog74
ghostdog74

Reputation: 343201

have you considered doing it non-regex way?

>>> string='lala ####hey## there'
>>> string.split("####")[1].split("#")[0]
'hey'

Upvotes: 0

glebm
glebm

Reputation: 21130

'^#{2,}([^#]*)#{2,}' -- any number of # >= 2 on either end

be careful with using lazy quantifiers like (.*?) because it'd match '##abc#####' and capture 'abc###'. also lazy quantifiers are very slow

Upvotes: 1

Tg.
Tg.

Reputation: 5804

Adding + to regex, which means to match one or more character.

pattern='#+(.*?)#+'
prog=re.compile(pattern)

string='###HEY##'
result=prog.search(string)
print result.group(1)

Output:

HEY

Upvotes: 0

Ming-Tang
Ming-Tang

Reputation: 17659

Try the "block comment trick": /##((?:[^#]|#[^#])+?)##/ Screenshot of working example

Upvotes: 0

Marcelo Cantos
Marcelo Cantos

Reputation: 186118

To match at least two hashes at either end:

pattern='##+(.*?)##+'

Upvotes: 3

Related Questions