Jakemmarsh
Jakemmarsh

Reputation: 4661

Parsing expression into list

I am given an expression using parantheses and +'s, such as (((a+b)+c)+(d+e)).

I need to find the parse tree of this, and then print the list form of this parse tree like: [ [ [a, b], c ], [d, e] ]

I was thinking I'd use something like ast, then ast2list. However, due to my not fully understanding these, I am repeatedly getting syntax errors. This is what I have:

import ast
import parser

a = ast.parse("(((a+b)+c)+(d+e))", mode='eval')
b = parser.ast2list(a)


print(b)

Could anyone guide me in the right direction? Thanks.

Upvotes: 1

Views: 585

Answers (5)

Sean Perry
Sean Perry

Reputation: 3886

Look at the docs for the ast module here where the NodeVisitor class is described.

import ast
import sys

class MyNodeVisitor(ast.NodeVisitor):
    op_dict = {
        ast.Add : '+',
        ast.Sub : '-',
        ast.Mult : '*',
    }

    type_dict = {
        ast.BinOp: lambda s, n: s.handleBinOp(n),
        ast.Name: lambda s, n: getattr(n, 'id'),
        ast.Num: lambda s, n: getattr(n, 'n'),
    }

    def __init__(self, *args, **kwargs):
        ast.NodeVisitor.__init__(self, *args, **kwargs)
        self.ast = []

    def handleBinOp(self, node):
        return (self.op_dict[type(node.op)], self.handleNode(node.left), 
                    self.handleNode(node.right))

    def handleNode(self, node):
        value = self.type_dict.get(type(node), None)
        return value(self, node)

    def visit_BinOp(self, node):
        op = self.handleBinOp(node)
        self.ast.append(op)

    def visit_Name(self, node):
        self.ast.append(node.id)

    def visit_Num(self, node):
        self.ast.append(node.n)

    def currentTree(self):
        return reversed(self.ast)

a = ast.parse(sys.argv[1])
visitor = MyNodeVisitor()
visitor.visit(a)
print list(visitor.currentTree())

Looks like this:

 $ ./ast_tree.py "5 + (1 + 2) * 3"
 [('+', 5, ('*', ('+', 1, 2), 3))]

Enjoy.

Upvotes: 2

PaulMcG
PaulMcG

Reputation: 63719

If you really want to do a parser, start by not writing any code, but by understanding how your grammar should work. Backus-Naur Format or BNF is the typical notation used to define your grammar. Infix notation is a common software engineering parsing topic, and the basic BNF structure for infix notation goes like:

letter ::= 'a'..'z'
operand ::= letter+
term ::= operand | '(' expr ')'
expr ::= term ( '+' term )*

The key is that term contains either your alphabetic operand or an entire subexpression wrapped in ()'s. That subexpression is just the same as the overall expression, so this recursive definition takes care of all the parenthesis nesting. The expression then is a term followed by zero or more terms, added on using your binary '+' operator. (You could expand term to handle subtraction and multiplication/division as well, but I'm not going to complicate this answer more than necessary.)

Pyparsing is a package that makes it easy to translate a BNF to a working parser using Python objects (Ply, spark, and yapps are other parsers, which follow the more traditional lex/yacc model of parser creation). Here is that BNF implemented directly using pyparsing:

from pyparsing import Suppress, Word, alphas, Forward, Group, ZeroOrMore

LPAR, RPAR, PLUS = map(Suppress, "()+")
operand = Word(alphas)

# forward declare our overall expression, necessary when defining a recursive grammar
expr = Forward()

# each term is either an alpha operand, or an expr in ()'s
term = operand | Group(LPAR + expr + RPAR)

# define expr as a term, with optional '+ term's
expr << term + ZeroOrMore(PLUS + term)

# try it out
s = "(((a+b)+c)+(d+e))"
print expr.parseString(s)

giving:

[[[['a', 'b'], 'c'], ['d', 'e']]]

Infix notation with recognition of precedence of operations is a pretty common parser, or part of a larger parser, so pyparsing includes a helper builtin call operatorPrecedence to take care of all the nesting/grouping/recursion, etc. Here is that same parser written using operatorPrecedence:

from pyparsing import operatorPrecedence, opAssoc, Word, alphas, Suppress

# define an infix notation with precedence of operations
# you only define one operation '+', so this is a simple case
operand = Word(alphas)
expr = operatorPrecedence(operand,
    [
    ('+', 2, opAssoc.LEFT),
    ])

print expr.parseString(s)

giving the same results as before.

More detailed examples can be found online at the pyparsing wiki - the explicit implementation at fourFn.py and the operatorPrecedence implementation at simpleArith.py.

Upvotes: 2

senderle
senderle

Reputation: 150977

This is a simple enough problem that you could just write a solution from scratch. This assumes that all variable names are one-character long, or that the expression has been correctly converted into a list of tokens. I threw in checks to make sure all parenthesis are matched; obviously you should swap out CustomError for whatever exception you want to throw or other action you want to take.

def expr_to_list(ex):
    tree = []
    stack = [tree]
    for c in ex:
        if c == '(':
            new_node = []
            stack[-1].append(new_node)
            stack.append(new_node)
        elif c == '+' or c == ' ':
            continue
        elif c == ')':
            if stack[-1] == tree:
                raise CustomError('Unmatched Parenthesis')
            stack.pop()
        else:
            stack[-1].append(c)
    if stack[-1] != tree:
        raise CustomError('Unmatched Parenthesis')
    return tree

Tested:

>>> expr_to_list('a + (b + c + (x + (y + z) + (d + e)))')
['a', ['b', 'c', ['x', ['y', 'z'], ['d', 'e']]]]

And for multi-character variable names, using a regex for tokenization:

>>> tokens = re.findall('\(|\)|\+|[\w]+', 
                        '(apple + orange + (banana + grapefruit))')
>>> tokens
['(', 'apple', '+', 'orange', '+', '(', 'banana', '+', 'grapefruit', ')', ')']
>>> expr_to_list(tokens)
[['apple', 'orange', ['banana', 'grapefruit']]]

Upvotes: 0

Senthil Kumaran
Senthil Kumaran

Reputation: 56841

I would make a translater too. Doing it via ast was bit cumbersome to implement for this purpose.

[tw-172-25-24-198 ~]$ cat a1.py 
import re

def multiple_replace(text, adict):
    rx = re.compile('|'.join(map(re.escape, adict)))
    def one_xlat(match):
        return adict[match.group(0)]
    return rx.sub(one_xlat, text)

# Closure based approach
def make_xlat(*args, **kwds):
    adict = dict(*args, **kwds)
    rx = re.compile('|'.join(map(re.escape, adict)))
    def one_xlat(match):
        return adict[match.group(0)]
    def xlat(text):
        return rx.sub(one_xlat, text)
    return xlat

if __name__ == "__main__":
    text = "((a+b)+c+(d+(e+f)))"
    adict = {
        "+":",",
        "(":"[",
        ")":"]",
    }
    translate = make_xlat(adict)
    print translate(text)

Should give

[[a,b],c,[d,[e,f]]]

Note - I have been having this snippet in my collections. It is from Python Cookbook. It does multiple replacements on the string, with the replacement key and values in the dictionary in a single pass.

Upvotes: 0

jedwards
jedwards

Reputation: 30210

Colleen's comment can be realized with something like:

str = "(((a+b)+c)+(d+e))"


replacements = [
    ('(','['),
    (')',']'),
    ('+',','),
    # If a,b,c,d,e are defined variables, you don't need the following 5 lines
    ('a',"'a'"),
    ('b',"'b'"),
    ('c',"'c'"),
    ('d',"'d'"),
    ('e',"'e'"),
]

for (f,s) in replacements:
    str = str.replace(f,s)

obj = eval(str)

print(str)       # [[['a','b'],'c'],['d','e']]
print(obj)       # [[['a', 'b'], 'c'], ['d', 'e']]
# You can access the parsed elements as you would any iterable:
print(obj[0])    # [['a', 'b'], 'c']
print(obj[1])    # ['d', 'e']
print(obj[1][0]) # d

Upvotes: 2

Related Questions