Giuseppe
Giuseppe

Reputation: 447

Python - Return all substrings in the first group of nested parentheses

I want to find an efficient way to select all the sub-strings contained in the first group of nested parentheses from a string.

For example:

input: a d f gsds ( adsd ) adsdaa    
output: ( adsd )

input: adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad ) 
output: ( sadad adsads ( adsda ) dsadsa )

intput: a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )
output: ( adad ( sad ) sdada asdad )

Notice there could be multiple groups of nested parentheses.

One solution would be scanning the string char by char and keeping track of the number of opened parentheses until (decreasing the number, once we have a closing parenthesis) the counter becomes 0 again.

I am wondering if there is a simpler way to do it? Maybe with regular expressions?

Thanks

Upvotes: 2

Views: 732

Answers (2)

quasi-human
quasi-human

Reputation: 1928

You can use pyparsing to select all the sub-strings contained in the first group of nested parentheses from a string.

import pyparsing as pp

pattern = pp.Regex(r'.*?(?=\()') + pp.original_text_for(pp.nested_expr('(', ')'))

txt = 'a d f gsds ( adsd ) adsdaa'
result = pattern.parse_string(txt)[1]
assert result == '( adsd )'

txt = 'adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )'
result = pattern.parse_string(txt)[1]
assert result == '( sadad adsads ( adsda ) dsadsa )'

txt = 'a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )'
result = pattern.parse_string(txt)[1]
assert result == '( adad ( sad ) sdada asdad )'

* pyparsing can be installed by pip install pyparsing

Note:

If a pair of parentheses gets broken inside () (for example a(b(c), a(b)c), etc), an unexpected result is obtained or IndexError is raised. So be careful. (See: Python extract string in a phrase)

Upvotes: 1

logic
logic

Reputation: 1727

I wrote a little function:

def parens(s):
    i=s[s.find('('):s.find(')')].count('(')   #counts number of '(' until the first ')'
    groups = s[s.find('('):].split(')')       #splits the string at every ')'
    print ')'.join(groups[:i]) +')'           #joins the list with ')' using the number of counted '('

Demo:

>>> parens('a d f gsds ( adsd ) adsdaa')
( adsd )

>>> parens('adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )')
( sadad adsads ( adsda ) dsadsa )

>>> parens('a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )')
( adad ( sad ) sdada asdad )

Upvotes: 2

Related Questions