Reputation:
I know many answers exist to the question on how to split up a string respecting parenthesis, but they never do so recursively.
Looking at the string 1 2 3 (test 0, test 0) (test (0 test) 0)
:
Regex \s(?![^\(]*\))
returns "1", "2", "3", "(test 0, test 0)", "(test", "(0 test) 0)"
The regex I'm looking for would return either
"1", "2", "3", "(test 0, test 0)", "(test (0 test)0)"
or
"1", "2", "3", "test 0, test 0", "test (0 test)0"
which would let me recursively use it on the results again until no parentheses remain.
Ideally it would also respect escaped parentheses, but I myself am not this advanced in regex knowing only the basics.
Does anyone have an idea on how to take on this?
Upvotes: 2
Views: 1083
Reputation: 1928
Alternatively, you can use pyparsing as well.
import pyparsing as pp
pattern = pp.ZeroOrMore(pp.Regex(r'\S+') ^ pp.original_text_for(pp.nested_expr('(', ')')))
# Tests
string = '1 2 3 (test 0, test 0) (test (0 test) 0)'
result = pattern.parse_string(string).as_list()
answer = ['1', '2', '3', '(test 0, test 0)', '(test (0 test) 0)']
assert result == answer
string = ''
result = pattern.parse_string(string).as_list()
answer = []
assert result == answer
string = 'a'
result = pattern.parse_string(string).as_list()
answer = ['a']
assert result == answer
string = ' a (1) ! '
result = pattern.parse_string(string).as_list()
answer = ['a', '(1)', '!']
assert result == answer
string = ' a (b) cd (e f) g hi (j (k l) m) (o p (qr (s t) u v) w (x y) z)'
result = pattern.parse_string(string).as_list()
answer = ['a', '(b)', 'cd', '(e f)', 'g', 'hi', '(j (k l) m)', '(o p (qr (s t) u v) w (x y) z)']
assert result == answer
* pyparsing
can be installed by pip install pyparsing
In addition, you can directly parse all the nested parentheses at once:
pattern = pp.ZeroOrMore(pp.Regex(r'\S+') ^ pp.nested_expr('(', ')'))
string = '1 2 3 (test 0, test 0) (test (0 test) 0)'
result = pattern.parse_string(string).as_list()
answer = ['1', '2', '3', ['test', '0,', 'test', '0'], ['test', ['0', 'test'], '0']]
assert result == answer
* Whitespace is a delimiter in this case.
If a pair of parentheses gets broken inside ()
(for example a(b(c)
, a(b)c)
, etc), an unexpected result is obtained or IndexError
is raised. So be careful to use. (See: Python extract string in a phrase)
Upvotes: 1
Reputation: 5281
Using regex
only for the task might work but it wouldn't be straightforward.
Another possibility is writing a simple algorithm to track the parentheses in the string:
re.split
)start_parens_count
for (
and end_parens_count
for )
.term
)term
to the list of values & reset the counters/temp vars.Here's an example:
import re
string = "1 2 3 (test 0, test 0) (test (0 test) 0)"
result, start_parens_count, end_parens_count, term = [], 0, 0, ""
for x in re.split(r"([()])", string):
if not x.strip():
continue
elif x == "(":
if start_parens_count > 0:
term += "("
start_parens_count += 1
elif x == ")":
end_parens_count += 1
if end_parens_count == start_parens_count:
result.append(term)
end_parens_count, start_parens_count, term = 0, 0, ""
else:
term += ")"
elif start_parens_count > end_parens_count:
term += x
else:
result.extend(x.strip(" ").split(" "))
print(result)
# ['1', '2', '3', 'test 0, test 0', 'test (0 test) 0']
Not very elegant, but works.
Upvotes: 2
Reputation: 626896
You can use pip install regex
and use
import regex
text = "1 2 3 (test 0, test 0) (test (0 test) 0)"
matches = [match.group() for match in regex.finditer(r"(?:(\((?>[^()]+|(?1))*\))|\S)+", text)]
print(matches)
# => ['1', '2', '3', '(test 0, test 0)', '(test (0 test) 0)']
See the online Python demo. See the regex demo. The regex matches:
(?:
- start of a non-capturing group:
(\((?>[^()]+|(?1))*\))
- a text between any nested parentheses|
- or
\S
- any non-whitespace char)+
- end of the group, repeat one or more timesUpvotes: 1