karaoke02
karaoke02

Reputation: 1

Parsing string from nested brackets

I have a string with the following format

[KEY=VALUE, KEY2=VALUE, KEY3=VALUE, KEY4=VALUE_COMPLEX_CHARS, KEY={KEY=VALUE, COLLECTION[KEY][KEY]=VLAUE, COLLECTION[KEY][KEY][KEY]=VALUE}, KEY=VALUE] [TEXT] [CAN BE TEXT OR KEY=VAL]

And I want to parse it using Python, mapping it into dictionary or a list.

I'm able to parse it with the following code:

lst1 = [x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', str)]
print(lst1[0])

But the problem is that the code above will break if the string contains nested brackets like the above one. It works normally when the input is simple:

[KEY1=VALUE, KEY2=VALUE, KEY=VALUE, KEY=VALUE_COMPLEX_CHARS KEY=VALUE] [TEXT] [CAN BE TEXT OR KEY=VAL]

The out put is a list contains everything between each bracket

list[0] = [...]
list[1] = [...]

Please help the code above so it can parse complex string with nested brackets.

Thank you very much for your help.

Upvotes: 0

Views: 317

Answers (1)

Tim Roberts
Tim Roberts

Reputation: 54698

I don't think this is exactly what you're looking for, but this shows how to handle nested structures like this.

s = "[KEY=VALUE, KEY2=VALUE, KEY3=VALUE, KEY4=VALUE_COMPLEX_CHARS, KEY={KEY=VALUE, COLLECTION[KEY][KEY]=VLAUE, COLLECTION[KEY][KEY][KEY]=VALUE}, KEY=VALUE] [TEXT] [CAN BE TEXT OR KEY=VAL]"

def parse(s):
    accum = []
    last = ''
    key = ''
    nested = 0
    while s:
        c = s.pop(0)
        if c in '[{':
            if last:
                last += c
                nested += 1
            else:
                accum.append( parse(s) )
        elif c == ' ' and not last:
            continue
        elif c == '=':
            key = last
            last = ''
        elif c == ',':
            if key:
                accum.append( (key, last) )
            else:
                accum.append( last )
            key = ''
            last = ''
        elif c in ']}':
            if nested:
                last += c
                nested -= 1
            elif key:
                accum.append( (key, last) )
            elif last:
                accum.append( last )
            return accum
        else:
            last += c
    return accum

s = list(s)
accum = []
while s:
    accum.extend(parse(s))

print(accum)

Output:

[[('KEY', 'VALUE'), ('KEY2', 'VALUE'), ('KEY3', 'VALUE'), ('KEY4', 'VALUE_COMPLEX_CHARS'), [('KEY', 'VALUE')], ['KEY'], 'VLAUE'], ['KEY'], ['KEY'], 'VALUE', '', ('KEY', 'VALUE'), ['TEXT'], [('CAN BE TEXT OR KEY', 'VAL')]]

Upvotes: 1

Related Questions