Reputation: 447
I want to find an efficient way to select all the sub-strings contained in the first group of nested parentheses from a string.
For example:
input: a d f gsds ( adsd ) adsdaa
output: ( adsd )
input: adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )
output: ( sadad adsads ( adsda ) dsadsa )
intput: a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )
output: ( adad ( sad ) sdada asdad )
Notice there could be multiple groups of nested parentheses.
One solution would be scanning the string char
by char
and keeping track of the number of opened parentheses until (decreasing the number, once we have a closing parenthesis) the counter becomes 0 again.
I am wondering if there is a simpler way to do it? Maybe with regular expressions?
Thanks
Upvotes: 2
Views: 732
Reputation: 1928
You can use pyparsing
to select all the sub-strings contained in the first group of nested parentheses from a string.
import pyparsing as pp
pattern = pp.Regex(r'.*?(?=\()') + pp.original_text_for(pp.nested_expr('(', ')'))
txt = 'a d f gsds ( adsd ) adsdaa'
result = pattern.parse_string(txt)[1]
assert result == '( adsd )'
txt = 'adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )'
result = pattern.parse_string(txt)[1]
assert result == '( sadad adsads ( adsda ) dsadsa )'
txt = 'a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )'
result = pattern.parse_string(txt)[1]
assert result == '( adad ( sad ) sdada asdad )'
* pyparsing
can be installed by pip install pyparsing
If a pair of parentheses gets broken inside ()
(for example a(b(c)
, a(b)c)
, etc), an unexpected result is obtained or IndexError
is raised. So be careful. (See: Python extract string in a phrase)
Upvotes: 1
Reputation: 1727
I wrote a little function:
def parens(s):
i=s[s.find('('):s.find(')')].count('(') #counts number of '(' until the first ')'
groups = s[s.find('('):].split(')') #splits the string at every ')'
print ')'.join(groups[:i]) +')' #joins the list with ')' using the number of counted '('
Demo:
>>> parens('a d f gsds ( adsd ) adsdaa')
( adsd )
>>> parens('adadsa ( sadad adsads ( adsda ) dsadsa ) ( dsadsad )')
( sadad adsads ( adsda ) dsadsa )
>>> parens('a ana anan anan ( adad ( sad ) sdada asdad ) ( sadad ( adasd ) asda ) sdafds ( afdasf )')
( adad ( sad ) sdada asdad )
Upvotes: 2