Reputation: 1220
I parse a file to a Python list and I encountered a nested list like this:
{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }
I want to interpret it as a list, yet nested, so I want to be the resulting Python list as follows:
[1,4,[2a,0.0],[3,0.0],[4c,0.0],[5,0.0]]
I manage to do a correct string of this with a following:
l = """{ 1 4{ 2 0.0 }{ 3 0.0 }{ 4 0.0 }{ 5 0.0 } }"""
l = l.replace("{\t",",[").replace("\t}","]").replace("{","[").replace("}","]").replace("\t",",")[1:]
I can also apply l.strip("\t")
so that it is a list, but not for a nested, otherwise it will be flattened, which I do not want.
I tried with ast.literal_eval(l)
, but it fails on strings e.g. 2a
Upvotes: 0
Views: 1115
Reputation: 22942
You can develop your own parser using RegEx. In your situation, it is not too difficult. You can parse the enclosing curly brackets, then split the items and evaluate each item recursively.
Here is an example (which is not perfect):
import re
RE_BRACE = r"\{.*\}"
RE_ITEM = r"\d+[a-z]+"
RE_FLOAT = r"[-+]?\d*\.\d+"
RE_INT = r"\d+"
find_all_items = re.compile(
"|".join([RE_BRACE, RE_ITEM, RE_FLOAT, RE_INT]),
flags=re.DOTALL).findall
def parse(text):
mo = re.match(RE_BRACE, text, flags=re.DOTALL)
if mo:
content = mo.group()[1:-1]
items = [parse(part) for part in find_all_items(content)]
return items
mo = re.match(RE_ITEM, text, flags=re.DOTALL)
if mo:
return mo.group()
mo = re.match(RE_FLOAT, text, flags=re.DOTALL)
if mo:
return float(mo.group())
mo = re.match(RE_INT, text, flags=re.DOTALL)
if mo:
return int(mo.group())
raise Exception("Invalid text: {0}".format(text))
note: this parser cannot parse {1 {2} {3} 4}
the right way. You need a recursive parser like pyparsing
for that.
Demo:
s = '''{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }'''
l = parse(s)
print(l)
You get:
[1, 4, ['2a', 0.0, [3, 0.0, '4c', 0.0], 5, 0.0]]
Upvotes: 1
Reputation: 63709
Pyparsing has a built-in helper nestedExpr
to help parse nested lists between opening and closing delimiters:
>>> import pyparsing as pp
>>> nested_braces = pp.nestedExpr('{', '}')
>>> t = """{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }"""
>>> print(nested_braces.parseString(t).asList())
[['1', '4', ['2a', '0.0'], ['3', '0.0'], ['4c', '0.0'], ['5', '0.0']]]
Upvotes: 7