Remi Guan
Remi Guan

Reputation: 22282

`re` module match text between two pairs of brackets in Python3

For example, I have a string like '(10 + 20) / (10 + 20)'.

And now I want to match (10 + 20). So I write a script like this:

text = '(10 + 20) / (10 + 20)'                                                                                                          
test1 = re.findall(r'(.*)', text)                                            
test2 = re.findall(r'(.+?)', text)                                           

for i in test1:                                                              
    print(i, end='')                                                         
else:                                                                        
    print()                                                                  

for i in test2:                                                              
    print(i, end='')                                                         
else:                                                                        
    print()       

And the output is this:

(10 + 20) / (10 + 20)                                                                                                                       
(10 + 20) / (10 + 20)

I don't understand, doesn't .+? not greedy?

Upvotes: 2

Views: 2639

Answers (3)

jfs
jfs

Reputation: 414169

>>> import re
>>> re.findall(r'\([^()]+\)', '(10 + 20) / (10 + 20)')
['(10 + 20)', '(10 + 20)']

The dialect of regular expressions used in the re module can't support arbitrary nested parentheses therefore [^()] that matches everything except parentheses is always valid here.

Note: you don't need to escape () inside [] that defines a set of characters.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

The round brackets in a regex pattern must be escaped with \ to match literal round brackets:

test2 = re.findall(r'\(.+?\)', text) 

See demo

A "raw" string literal does not mean that you do not have to escape special regex characters but it means you can use just one backslash to escape them, not two.

See this excerpt from 6.2.5.8. Raw String Notation:

Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:

>>>
>>> re.match(r"\W(.)\1\W", " ff ")
<_sre.SRE_Match object; span=(0, 4), match=' ff '>
>>> re.match("\\W(.)\\1\\W", " ff ")
<_sre.SRE_Match object; span=(0, 4), match=' ff '>

The docs say usually, but it does not mean you have to use raw string literals every time.

It is true that .+? is a lazy pattern, it means match 1 or more characters other than a newline, but as few as possible.

Upvotes: 4

Lawrence Benson
Lawrence Benson

Reputation: 1406

You need to escape ( and ) like this:

`\(.*\)`

and this

`\(.+?\)`. 

The first one will match until it finds the last possible ), the other one i non-greedy and will stop at the first )

Upvotes: 1

Related Questions