Reputation: 4661
I am given an expression using parantheses and +'s, such as (((a+b)+c)+(d+e)).
I need to find the parse tree of this, and then print the list form of this parse tree like: [ [ [a, b], c ], [d, e] ]
I was thinking I'd use something like ast, then ast2list. However, due to my not fully understanding these, I am repeatedly getting syntax errors. This is what I have:
import ast
import parser
a = ast.parse("(((a+b)+c)+(d+e))", mode='eval')
b = parser.ast2list(a)
print(b)
Could anyone guide me in the right direction? Thanks.
Upvotes: 1
Views: 585
Reputation: 3886
Look at the docs for the ast module here where the NodeVisitor
class is described.
import ast
import sys
class MyNodeVisitor(ast.NodeVisitor):
op_dict = {
ast.Add : '+',
ast.Sub : '-',
ast.Mult : '*',
}
type_dict = {
ast.BinOp: lambda s, n: s.handleBinOp(n),
ast.Name: lambda s, n: getattr(n, 'id'),
ast.Num: lambda s, n: getattr(n, 'n'),
}
def __init__(self, *args, **kwargs):
ast.NodeVisitor.__init__(self, *args, **kwargs)
self.ast = []
def handleBinOp(self, node):
return (self.op_dict[type(node.op)], self.handleNode(node.left),
self.handleNode(node.right))
def handleNode(self, node):
value = self.type_dict.get(type(node), None)
return value(self, node)
def visit_BinOp(self, node):
op = self.handleBinOp(node)
self.ast.append(op)
def visit_Name(self, node):
self.ast.append(node.id)
def visit_Num(self, node):
self.ast.append(node.n)
def currentTree(self):
return reversed(self.ast)
a = ast.parse(sys.argv[1])
visitor = MyNodeVisitor()
visitor.visit(a)
print list(visitor.currentTree())
Looks like this:
$ ./ast_tree.py "5 + (1 + 2) * 3"
[('+', 5, ('*', ('+', 1, 2), 3))]
Enjoy.
Upvotes: 2
Reputation: 63719
If you really want to do a parser, start by not writing any code, but by understanding how your grammar should work. Backus-Naur Format or BNF is the typical notation used to define your grammar. Infix notation is a common software engineering parsing topic, and the basic BNF structure for infix notation goes like:
letter ::= 'a'..'z'
operand ::= letter+
term ::= operand | '(' expr ')'
expr ::= term ( '+' term )*
The key is that term
contains either your alphabetic operand or an entire subexpression wrapped in ()'s. That subexpression is just the same as the overall expression, so this recursive definition takes care of all the parenthesis nesting. The expression then is a term followed by zero or more terms, added on using your binary '+' operator. (You could expand term
to handle subtraction and multiplication/division as well, but I'm not going to complicate this answer more than necessary.)
Pyparsing is a package that makes it easy to translate a BNF to a working parser using Python objects (Ply, spark, and yapps are other parsers, which follow the more traditional lex/yacc model of parser creation). Here is that BNF implemented directly using pyparsing:
from pyparsing import Suppress, Word, alphas, Forward, Group, ZeroOrMore
LPAR, RPAR, PLUS = map(Suppress, "()+")
operand = Word(alphas)
# forward declare our overall expression, necessary when defining a recursive grammar
expr = Forward()
# each term is either an alpha operand, or an expr in ()'s
term = operand | Group(LPAR + expr + RPAR)
# define expr as a term, with optional '+ term's
expr << term + ZeroOrMore(PLUS + term)
# try it out
s = "(((a+b)+c)+(d+e))"
print expr.parseString(s)
giving:
[[[['a', 'b'], 'c'], ['d', 'e']]]
Infix notation with recognition of precedence of operations is a pretty common parser, or part of a larger parser, so pyparsing includes a helper builtin call operatorPrecedence
to take care of all the nesting/grouping/recursion, etc. Here is that same parser written using operatorPrecedence
:
from pyparsing import operatorPrecedence, opAssoc, Word, alphas, Suppress
# define an infix notation with precedence of operations
# you only define one operation '+', so this is a simple case
operand = Word(alphas)
expr = operatorPrecedence(operand,
[
('+', 2, opAssoc.LEFT),
])
print expr.parseString(s)
giving the same results as before.
More detailed examples can be found online at the pyparsing wiki - the explicit implementation at fourFn.py and the operatorPrecedence implementation at simpleArith.py.
Upvotes: 2
Reputation: 150977
This is a simple enough problem that you could just write a solution from scratch. This assumes that all variable names are one-character long, or that the expression has been correctly converted into a list of tokens. I threw in checks to make sure all parenthesis are matched; obviously you should swap out CustomError
for whatever exception you want to throw or other action you want to take.
def expr_to_list(ex):
tree = []
stack = [tree]
for c in ex:
if c == '(':
new_node = []
stack[-1].append(new_node)
stack.append(new_node)
elif c == '+' or c == ' ':
continue
elif c == ')':
if stack[-1] == tree:
raise CustomError('Unmatched Parenthesis')
stack.pop()
else:
stack[-1].append(c)
if stack[-1] != tree:
raise CustomError('Unmatched Parenthesis')
return tree
Tested:
>>> expr_to_list('a + (b + c + (x + (y + z) + (d + e)))')
['a', ['b', 'c', ['x', ['y', 'z'], ['d', 'e']]]]
And for multi-character variable names, using a regex for tokenization:
>>> tokens = re.findall('\(|\)|\+|[\w]+',
'(apple + orange + (banana + grapefruit))')
>>> tokens
['(', 'apple', '+', 'orange', '+', '(', 'banana', '+', 'grapefruit', ')', ')']
>>> expr_to_list(tokens)
[['apple', 'orange', ['banana', 'grapefruit']]]
Upvotes: 0
Reputation: 56841
I would make a translater too. Doing it via ast was bit cumbersome to implement for this purpose.
[tw-172-25-24-198 ~]$ cat a1.py
import re
def multiple_replace(text, adict):
rx = re.compile('|'.join(map(re.escape, adict)))
def one_xlat(match):
return adict[match.group(0)]
return rx.sub(one_xlat, text)
# Closure based approach
def make_xlat(*args, **kwds):
adict = dict(*args, **kwds)
rx = re.compile('|'.join(map(re.escape, adict)))
def one_xlat(match):
return adict[match.group(0)]
def xlat(text):
return rx.sub(one_xlat, text)
return xlat
if __name__ == "__main__":
text = "((a+b)+c+(d+(e+f)))"
adict = {
"+":",",
"(":"[",
")":"]",
}
translate = make_xlat(adict)
print translate(text)
Should give
[[a,b],c,[d,[e,f]]]
Note - I have been having this snippet in my collections. It is from Python Cookbook. It does multiple replacements on the string, with the replacement key and values in the dictionary in a single pass.
Upvotes: 0
Reputation: 30210
Colleen's comment can be realized with something like:
str = "(((a+b)+c)+(d+e))"
replacements = [
('(','['),
(')',']'),
('+',','),
# If a,b,c,d,e are defined variables, you don't need the following 5 lines
('a',"'a'"),
('b',"'b'"),
('c',"'c'"),
('d',"'d'"),
('e',"'e'"),
]
for (f,s) in replacements:
str = str.replace(f,s)
obj = eval(str)
print(str) # [[['a','b'],'c'],['d','e']]
print(obj) # [[['a', 'b'], 'c'], ['d', 'e']]
# You can access the parsed elements as you would any iterable:
print(obj[0]) # [['a', 'b'], 'c']
print(obj[1]) # ['d', 'e']
print(obj[1][0]) # d
Upvotes: 2