Reputation: 1
I have a string with the following format
[KEY=VALUE, KEY2=VALUE, KEY3=VALUE, KEY4=VALUE_COMPLEX_CHARS, KEY={KEY=VALUE, COLLECTION[KEY][KEY]=VLAUE, COLLECTION[KEY][KEY][KEY]=VALUE}, KEY=VALUE] [TEXT] [CAN BE TEXT OR KEY=VAL]
And I want to parse it using Python, mapping it into dictionary or a list.
I'm able to parse it with the following code:
lst1 = [x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', str)]
print(lst1[0])
But the problem is that the code above will break if the string contains nested brackets like the above one. It works normally when the input is simple:
[KEY1=VALUE, KEY2=VALUE, KEY=VALUE, KEY=VALUE_COMPLEX_CHARS KEY=VALUE] [TEXT] [CAN BE TEXT OR KEY=VAL]
The out put is a list contains everything between each bracket
list[0] = [...]
list[1] = [...]
Please help the code above so it can parse complex string with nested brackets.
Thank you very much for your help.
Upvotes: 0
Views: 317
Reputation: 54698
I don't think this is exactly what you're looking for, but this shows how to handle nested structures like this.
s = "[KEY=VALUE, KEY2=VALUE, KEY3=VALUE, KEY4=VALUE_COMPLEX_CHARS, KEY={KEY=VALUE, COLLECTION[KEY][KEY]=VLAUE, COLLECTION[KEY][KEY][KEY]=VALUE}, KEY=VALUE] [TEXT] [CAN BE TEXT OR KEY=VAL]"
def parse(s):
accum = []
last = ''
key = ''
nested = 0
while s:
c = s.pop(0)
if c in '[{':
if last:
last += c
nested += 1
else:
accum.append( parse(s) )
elif c == ' ' and not last:
continue
elif c == '=':
key = last
last = ''
elif c == ',':
if key:
accum.append( (key, last) )
else:
accum.append( last )
key = ''
last = ''
elif c in ']}':
if nested:
last += c
nested -= 1
elif key:
accum.append( (key, last) )
elif last:
accum.append( last )
return accum
else:
last += c
return accum
s = list(s)
accum = []
while s:
accum.extend(parse(s))
print(accum)
Output:
[[('KEY', 'VALUE'), ('KEY2', 'VALUE'), ('KEY3', 'VALUE'), ('KEY4', 'VALUE_COMPLEX_CHARS'), [('KEY', 'VALUE')], ['KEY'], 'VLAUE'], ['KEY'], ['KEY'], 'VALUE', '', ('KEY', 'VALUE'), ['TEXT'], [('CAN BE TEXT OR KEY', 'VAL')]]
Upvotes: 1