Reputation: 912
I'm trying to build a python function that parses values inside of nested brackets. So far I have tried with loops and keeping track of the nested level but I am not having any luck.
Given the string:
x = 'some Text (((X))(((X)Y)Z))'
I need to get the value that is nested at a given level, if there is more than one brackets at that level, return them both in a list. For example:
level | Output
0 | ['some Text (((X))(((X)Y)Z))']
1 | ['((X))(((X)Y)Z)']
2 | ['(X)','((X)Y)Z']
3 | ['X','(X)Y']
4 | ['X']
Colour example of the problem:
My current code is:
def get_nested_val(text,level):
if level == 0:
return text
cur_lvl = 0 # what is the current nested level
cur_word = [] # temp var to build words
output = [] # output
# for each char in word...
for idx in range(len(text)):
tok = text[idx]
if tok == '(':
# go up level when a '(' is found
cur_lvl += 1
if tok == ')':
# go down level when a ')' is found
cur_lvl -= 1
cur_word.append(tok)
#check the next chracter to check if the word should be finished
finish_current_word = False
try:
next_char = text[idx+1]
if next_char == ')' and (cur_lvl-1) <= level:
continue
elif next_char == '(' and (cur_lvl+1) >= level:
continue
else:
finish_current_word = True
except IndexError:
finish_current_word = True
if finish_current_word:
output.append(''.join(cur_word))
cur_word = []
if cur_lvl > level:
cur_word.append(tok)
return output
x = 'some Text (((X))(((X)Y)Z))'
print(get_nested_val(x,4))
# level 1 : Expect: ['((X))(((X)Y)Z)'], actual ['((X))(((X)', ')Y)', ')Z))']
# level 2 : Expect: ['(X)','((X)Y)Z'], actual ['(X))((X)', ')Y)', '))']
# level 3 : Expect: ['X','(X)Y'], actual ['))', '(X)', ')', '))']
# level 4 : Expect: ['X'], actual ['))', ')', ')', '))']
Upvotes: 1
Views: 935
Reputation: 71451
You can use a recursive generator function:
import re
x = 'some Text (((X))(((X)Y)Z))'
def build_result(s, l, f = False):
t = ''
while (n:=next(s, None)) is not None:
if not l and (not f or n != ')'):
t += n
if n == ')':
break
if n == '(':
if not l:
t += ''.join(build_result(s, l if not l else l-1, l-1 == 0))
else:
yield from build_result(s, l if not l else l-1, l-1 == 0)
yield from ([] if not t else [t])
def get_levels(text, level):
return list(build_result(iter(re.findall('\(|\)|[\w\s]+', x)), level))
for i in range(5):
print(i, get_levels(x, i))
Output:
0 ['some Text (((X))(((X)Y)Z))']
1 ['((X))(((X)Y)Z)']
2 ['(X)', '((X)Y)Z']
3 ['X', '(X)Y']
4 ['X']
Upvotes: 4
Reputation: 106480
You can use a regex engine that supports regular expression recursion to match balanced constructs such as parentheses pairs. The excellent regex pacakage on PyPI is a regex engine that does so. With that, you can then use a recursive function that finds all the balanced parentheses constructs in the given text, strip the outermost parentheses from each match, and recursively find all the balanced parentheses within with one less level, until the level reaches 0, at which point the text is what we want to output:
import regex
def get_nested_level(text, level):
if level:
for subtext in regex.findall(r'\((?:\w+|(?R))*\)', text):
yield from get_nested_level(subtext[1:-1], level - 1)
else:
yield text
so that:
x = 'some Text (((X))(((X)Y)Z))'
for level in range(5):
print(level, list(get_nested_level(x, level)))
outputs:
0 ['some Text (((X))(((X)Y)Z))']
1 ['((X))(((X)Y)Z)']
2 ['(X)', '((X)Y)Z']
3 ['X', '(X)Y']
4 ['X']
Demo: https://replit.com/@blhsing/SafeOnlyCharactermapping
Upvotes: 1