Work of Artiz
Work of Artiz

Reputation: 1100

Parsing Logic Expressions with Pyparsing

To start of, this is not homework, I'm trying to learn pyparsing and I got stuck here.

My question is as follows, I'm trying to parse statements like (abc or def) or def)

My program goes to shit on an infix expression a or b, since both sides can be expressions themselves, which can again be infix expressions, the parser recurses until recursion depth is reached and no work gets done.

Code below:

# infix operators are automatically created and dealt with
infix_operators  = ['and', '&', 'or', '|', 'implies', '->']

variable  = Word(alphas)
infix_op  = oneOf(infix_operators, caseless=True)

expr         = Forward()
infix_expr   = (expr + infix_op + expr)
complex_expr = nestedExpr('(', ')', content=expr)
expr        << (infix_expr | complex_expr | variable)

print str(expr.parseString("(abc or def) or def)")[0])

My question is fairly simple; how would one go about avoiding an infinite loop in these kinds of situations?

Upvotes: 1

Views: 1175

Answers (1)

PaulMcG
PaulMcG

Reputation: 63792

The canonical solution is something that implements this BNF:

atom   := variable | 'True' | 'False' | '(' expr ')'
factor := [ 'not' ]... atom
term   := factor [ '&' factor ]...
expr   := term [ '|' term ]...

The left-recursion problem is addressed because, even though expr eventually recurses through term -> factor -> atom, when it gets to expr, it first has to parse a leading '('. So an expr never has to first parse a deeper expr before parsing some other elements first.

This BNF translates almost directly to pyparsing as:

and_ = Keyword('and')
or_ = Keyword('or')
not_ = Keyword('not')
true_ = Keyword('true')
false_ = Keyword('false')

not_op = not_ | '~'
and_op = and_ | '&'
or_op = or_ | '|'

expr = Forward()

identifier = ~(and_ | or_ | not_ | true_ | false_) + Word(alphas)
atom = identifier | Group('(' + expr + ')')
factor = Group(ZeroOrMore(not_op) + atom)
term = Group(factor + ZeroOrMore(and_op + factor))
expr <<= Group(term + ZeroOrMore(or_op + term))

Or you can use pyparsing's infixNotation helper:

expr = infixNotation(true_ | false_ | identifier,
    [
    (not_op, 1, opAssoc.RIGHT),
    (and_op, 2, opAssoc.LEFT),
    (or_op, 2, opAssoc.LEFT),
    ])

infixNotation is constructed with a base operand (in this case, either an alpha variable name or one of the boolean literals true or false), followed by a list of (operator, arity, associativity) tuples, given in order of operator precedence. infixNotation takes care of all the recursion definitions, the parsing of right-associative vs. left-associative operators, and also does some lookahead for operators, to avoid extra nesting of operations for a given precedence level if there are no operators.

You can test this expression using pyparsing's runTests method:

expr.runTests("""
    p and not q
    not p or p
    r and (p or q)
    r and p or q
    not q
    q
    """, fullDump=False)

Giving:

p and not q
[['p', 'and', ['not', 'q']]]

not p or p
[[['not', 'p'], 'or', 'p']]

r and (p or q)
[['r', 'and', ['p', 'or', 'q']]]

r and p or q
[[['r', 'and', 'p'], 'or', 'q']]

not q
[['not', 'q']]

q
['q']

Upvotes: 4

Related Questions