zaza
zaza

Reputation: 912

Get string value inside nested brackets at a given level

I'm trying to build a python function that parses values inside of nested brackets. So far I have tried with loops and keeping track of the nested level but I am not having any luck.

Given the string:

x = 'some Text (((X))(((X)Y)Z))'

I need to get the value that is nested at a given level, if there is more than one brackets at that level, return them both in a list. For example:

level | Output
  0   | ['some Text (((X))(((X)Y)Z))']
  1   | ['((X))(((X)Y)Z)']
  2   | ['(X)','((X)Y)Z']
  3   | ['X','(X)Y']
  4   | ['X']

Colour example of the problem:

Color example of the problem

My current code is:

def get_nested_val(text,level):
    if level == 0:
        return text
    
    cur_lvl = 0 # what is the current nested level
    cur_word = [] # temp var to build words
    output = [] # output
    
    # for each char in word...
    for idx in range(len(text)):
        tok = text[idx]
        
        if tok == '(':
            # go up level when a '(' is found
            cur_lvl += 1 
        if tok == ')':
            # go down level when a ')' is found
            cur_lvl -= 1 
            cur_word.append(tok)
            
            #check the next chracter to check if the word should be finished
            finish_current_word = False
            try:
                next_char = text[idx+1]
                if next_char == ')' and (cur_lvl-1) <= level:
                    continue
                elif next_char == '(' and (cur_lvl+1) >= level:
                    continue
                else:
                    finish_current_word = True
            except IndexError:
                finish_current_word = True
                
            if finish_current_word:
                output.append(''.join(cur_word))
                cur_word = []

        if cur_lvl > level:
            cur_word.append(tok)

    return output
    

x = 'some Text (((X))(((X)Y)Z))'
print(get_nested_val(x,4))
# level 1 : Expect: ['((X))(((X)Y)Z)'], actual ['((X))(((X)', ')Y)', ')Z))']
# level 2 : Expect: ['(X)','((X)Y)Z'],  actual ['(X))((X)', ')Y)', '))']
# level 3 : Expect: ['X','(X)Y'],       actual ['))', '(X)', ')', '))']
# level 4 : Expect: ['X'],              actual ['))', ')', ')', '))']

Upvotes: 1

Views: 935

Answers (2)

Ajax1234
Ajax1234

Reputation: 71451

You can use a recursive generator function:

import re
x = 'some Text (((X))(((X)Y)Z))'
def build_result(s, l, f = False):
   t = ''
   while (n:=next(s, None)) is not None:
      if not l and (not f or n != ')'):
         t += n
      if n == ')':
         break
      if n == '(':
         if not l:
            t += ''.join(build_result(s, l if not l else l-1, l-1 == 0))
         else:
            yield from build_result(s, l if not l else l-1, l-1 == 0)
   yield from ([] if not t else [t])
        
def get_levels(text, level):
   return list(build_result(iter(re.findall('\(|\)|[\w\s]+', x)), level))

for i in range(5):
   print(i, get_levels(x, i))

Output:

0 ['some Text (((X))(((X)Y)Z))']
1 ['((X))(((X)Y)Z)']
2 ['(X)', '((X)Y)Z']
3 ['X', '(X)Y']
4 ['X']

Upvotes: 4

blhsing
blhsing

Reputation: 106480

You can use a regex engine that supports regular expression recursion to match balanced constructs such as parentheses pairs. The excellent regex pacakage on PyPI is a regex engine that does so. With that, you can then use a recursive function that finds all the balanced parentheses constructs in the given text, strip the outermost parentheses from each match, and recursively find all the balanced parentheses within with one less level, until the level reaches 0, at which point the text is what we want to output:

import regex

def get_nested_level(text, level):
    if level:
        for subtext in regex.findall(r'\((?:\w+|(?R))*\)', text):
            yield from get_nested_level(subtext[1:-1], level - 1)
    else:
        yield text

so that:

x = 'some Text (((X))(((X)Y)Z))'
for level in range(5):
    print(level, list(get_nested_level(x, level)))

outputs:

0 ['some Text (((X))(((X)Y)Z))']
1 ['((X))(((X)Y)Z)']
2 ['(X)', '((X)Y)Z']
3 ['X', '(X)Y']
4 ['X']

Demo: https://replit.com/@blhsing/SafeOnlyCharactermapping

Upvotes: 1

Related Questions