Reputation: 22282
For example, I have a string like '(10 + 20) / (10 + 20)'
.
And now I want to match (10 + 20)
. So I write a script like this:
text = '(10 + 20) / (10 + 20)'
test1 = re.findall(r'(.*)', text)
test2 = re.findall(r'(.+?)', text)
for i in test1:
print(i, end='')
else:
print()
for i in test2:
print(i, end='')
else:
print()
And the output is this:
(10 + 20) / (10 + 20)
(10 + 20) / (10 + 20)
I don't understand, doesn't .+?
not greedy?
Upvotes: 2
Views: 2639
Reputation: 414169
>>> import re
>>> re.findall(r'\([^()]+\)', '(10 + 20) / (10 + 20)')
['(10 + 20)', '(10 + 20)']
The dialect of regular expressions used in the re
module can't support arbitrary nested parentheses therefore [^()]
that matches everything except parentheses is always valid here.
Note: you don't need to escape ()
inside []
that defines a set of characters.
Upvotes: 2
Reputation: 626748
The round brackets in a regex pattern must be escaped with \
to match literal round brackets:
test2 = re.findall(r'\(.+?\)', text)
See demo
A "raw" string literal does not mean that you do not have to escape special regex characters but it means you can use just one backslash to escape them, not two.
See this excerpt from 6.2.5.8. Raw String Notation:
Raw string notation
(r"text")
keeps regular expressions sane. Without it, every backslash ('\'
) in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:
>>>
>>> re.match(r"\W(.)\1\W", " ff ")
<_sre.SRE_Match object; span=(0, 4), match=' ff '>
>>> re.match("\\W(.)\\1\\W", " ff ")
<_sre.SRE_Match object; span=(0, 4), match=' ff '>
The docs say usually, but it does not mean you have to use raw string literals every time.
It is true that .+?
is a lazy pattern, it means match 1 or more characters other than a newline, but as few as possible.
Upvotes: 4
Reputation: 1406
You need to escape (
and )
like this:
`\(.*\)`
and this
`\(.+?\)`.
The first one will match until it finds the last possible )
, the other one i non-greedy and will stop at the first )
Upvotes: 1