stackystackstack
stackystackstack

Reputation: 31

Pyparsing using Forward to parse recursive expressions

I am trying to recursively parse an expression. I followed a few tutorials, and it seems like Forward() is the class that I need. However, something seemingly simple is causing me trouble.

Here is the code I wrote

from pyparsing import *
exp = Forward()
integer = Word(nums)
exp << (integer | (exp + '+' + exp))
input = "1+1"
print exp.parseString(input)

I want it to return ['1','+','1'] but it only returns ['1']

Help is much appreciated.

Upvotes: 3

Views: 1433

Answers (2)

relent95
relent95

Reputation: 4742

I recently came to this problem. Now the left recursion is supported on PyParsing 3.0.0b2 or later. But the feature should be explicitly enabled, and the order of operands of the | operator should be adjusted like this.

from pyparsing import ParserElement, Forward, Word, nums

ParserElement.enable_left_recursion()
exp = Forward()
integer = Word(nums)
exp << ((exp + '+' + exp) | integer)
input = "1+1"
print(exp.parseString(input))

This will output the following.

['1', '+', '1']

Upvotes: 1

PaulMcG
PaulMcG

Reputation: 63782

There are several issues you have here. In ascending order of importance:

  1. parseString will not raise an exception if there is extra text after the parsed content. Use exp.parseString(input, parseAll=True)
  2. '|' is MatchFirst, not MatchLongest. Since your integer is first, it will be matched first. Then the parser fails on the '+' all by itself. If you want match longest, use '^' operator.
  3. The Biggie: once you convert to '^' (or reorder the expressions to put exp + exp first, ahead of integer), you will find yourself blowing up the maximum recursion depth. That is because this parser has left-recursive definition of exp. That is, to parse an exp, it has to parse an exp, for which it has to parse an exp, etc. In general, many published BNFs use recursion to describe this kind of repetitive structure, but pyparsing does not do the necessary lookahead/backtrack for that to work. Try exp <<= integer + ZeroOrMore('+' + integer) | '(' + exp + ')' - this expression is not left-recursive, since you have to get past an opening parenthesis before parsing a nested exp.

EDIT: Sorry, I was a little too quick on my earlier suggestion, here is the proper way to do your recursive expression parsing:

from pyparsing import *

exp = Forward()
LPAR, RPAR = map(Suppress, "()")
integer = Word(nums)
term = integer | Group(LPAR + exp + RPAR)
exp << term + ZeroOrMore('+' + term)

input = "(1+1) + 1"
print(exp.parseString(input))

prints

[['1', '+', '1'], '+', '1']

If you trace through the code, you'll see the recursion: exp is defined using term, and term is defined using exp. The fourFn.py example is closest to this style; since writing that, I've added the infixNotation method to pyparsing, which would allow you to write:

exp = infixNotation(integer, [
                    ('+', 2, opAssoc.LEFT),
                    ])

infixNotation takes care of the recursive definitions internally, implicitly defines the '(' + exp + ')' expression, and makes it easy to implement a system of operators with precedence of operations.

Upvotes: 1

Related Questions