Parse nested list from string that cannot be parsed with ast.literal_eval

I parse a file to a Python list and I encountered a nested list like this:

{   1   4{  2a  0.0 }{  3   0.0 }{  4c  0.0 }{  5   0.0 }   }

I want to interpret it as a list, yet nested, so I want to be the resulting Python list as follows:

[1,4,[2a,0.0],[3,0.0],[4c,0.0],[5,0.0]]

I manage to do a correct string of this with a following:

l = """{    1   4{  2   0.0 }{  3   0.0 }{  4   0.0 }{  5   0.0 }   }"""
l = l.replace("{\t",",[").replace("\t}","]").replace("{","[").replace("}","]").replace("\t",",")[1:]

I can also apply l.strip("\t") so that it is a list, but not for a nested, otherwise it will be flattened, which I do not want.

I tried with ast.literal_eval(l), but it fails on strings e.g. 2a

Upvotes: 0

Views: 1115

Answers (2)

Laurent LAPORTE
Laurent LAPORTE

Reputation: 22942

You can develop your own parser using RegEx. In your situation, it is not too difficult. You can parse the enclosing curly brackets, then split the items and evaluate each item recursively.

Here is an example (which is not perfect):

import re

RE_BRACE = r"\{.*\}"
RE_ITEM = r"\d+[a-z]+"
RE_FLOAT = r"[-+]?\d*\.\d+"
RE_INT = r"\d+"

find_all_items = re.compile(
    "|".join([RE_BRACE, RE_ITEM, RE_FLOAT, RE_INT]),
    flags=re.DOTALL).findall

def parse(text):
    mo = re.match(RE_BRACE, text, flags=re.DOTALL)
    if mo:
        content = mo.group()[1:-1]
        items = [parse(part) for part in find_all_items(content)]
        return items
    mo = re.match(RE_ITEM, text, flags=re.DOTALL)
    if mo:
        return mo.group()
    mo = re.match(RE_FLOAT, text, flags=re.DOTALL)
    if mo:
        return float(mo.group())
    mo = re.match(RE_INT, text, flags=re.DOTALL)
    if mo:
        return int(mo.group())
    raise Exception("Invalid text: {0}".format(text))

note: this parser cannot parse {1 {2} {3} 4} the right way. You need a recursive parser like pyparsing for that.

Demo:

s = '''{   1   4{  2a  0.0 }{  3   0.0 }{  4c  0.0 }{  5   0.0 }   }'''

l = parse(s)
print(l)

You get:

[1, 4, ['2a', 0.0, [3, 0.0, '4c', 0.0], 5, 0.0]]

Upvotes: 1

PaulMcG
PaulMcG

Reputation: 63709

Pyparsing has a built-in helper nestedExpr to help parse nested lists between opening and closing delimiters:

>>> import pyparsing as pp
>>> nested_braces = pp.nestedExpr('{', '}')
>>> t = """{   1   4{  2a  0.0 }{  3   0.0 }{  4c  0.0 }{  5   0.0 }   }"""
>>> print(nested_braces.parseString(t).asList())
[['1', '4', ['2a', '0.0'], ['3', '0.0'], ['4c', '0.0'], ['5', '0.0']]]

Upvotes: 7

Related Questions