Reputation:
I'm trying to write something that will parse some code. I'm able to successfully parse foo(spam)
and spam+eggs
, but foo(spam+eggs)
(recursive descent? my terminology from compilers is a bit rusty) fails.
I have the following code:
from pyparsing_py3 import *
myVal = Word(alphas+nums+'_')
myFunction = myVal + '(' + delimitedList( myVal ) + ')'
myExpr = Forward()
mySubExpr = ( \
myVal \
| (Suppress('(') + Group(myExpr) + Suppress(')')) \
| myFunction \
)
myExpr << Group( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )
# SHOULD return: [blah, [foo, +, bar]]
# but actually returns: [blah]
print(myExpr.parseString('blah(foo+bar)'))
Upvotes: 2
Views: 1379
Reputation: 63709
I've found that a good habit to get into when using the '<<' operator with Forwards is to always enclose the RHS in parentheses. That is:
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr )
is better as:
myExpr << ( mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) )
This is a result of my unfortunate choice of '<<' as the "insertion" operator for inserting the expression into a Forward. The parentheses are unnecessary in this particular case, but in this one:
integer = Word(nums)
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr ) | integer
we see why I say "unfortunate". If I simplify this to "A << B | C", we easily see that the precedence of operations causes evaluation to be performed as "(A << B) | C", since '<<' has higher precedence than '|'. The result is that the Forward A only gets the expression B inserted in it. The "| C" part does get executed, but what happens is that you get "A | C" which creates a MatchFirst object, which is then immediately discarded since it is not assigned to any variable name. The solution would be to group the statement within parentheses as "A << (B | C)". In expressions composed only using '+' operations, there is no actual need for the parentheses, since '+' has a higher precedence than '<<'. But this is just lucky coding, and causes problem when someone later adds an alternative expression using '|' and doesn't realize the precedence implications. So I suggest just adopting the style "A << (expression)" to help avoid this confusion.
(Someday I will write pyparsing 2.0 - which will allow me to break compatibilty with existing code - and change this to use the '<<=' operator, which fixes all of these precedence issues, since '<<=' has lower precedence than any of the other operators used by pyparsing.)
Upvotes: 4
Reputation: 881487
Several issues: delimitedList is looking for a comma-delimited list of myVal, i.e. identifiers, as the only acceptable form of argument list, so of course it can't match 'foo+bar' (not a comma-delimited list of myVal!); fixing that reveals another -- myVal and myFunction start the same way so their order in mySubExpr matters; fixing that reveals yet another -- TWO levels of nesting instead of one. This versions seems ok...:
myVal = Word(alphas+nums+'_')
myExpr = Forward()
mySubExpr = (
(Suppress('(') + Group(myExpr) + Suppress(')'))
| myVal + Suppress('(') + Group(delimitedList(myExpr)) + Suppress(')')
| myVal
)
myExpr << mySubExpr + ZeroOrMore( oneOf('+ - / * =') + mySubExpr )
print(myExpr.parseString('blah(foo+bar)'))
emits ['blah', ['foo', '+', 'bar']]
as desired. I also removed the redundant backslashes, since logical line continuation occurs anyway within parentheses; they were innocuous but did hamper readability.
Upvotes: 4