Reputation: 15679
I'm trying to understand the Forward()
element from pyparsing. Suppose I have this simple BNF:
identifier =
"a..z,$,_" < "a..z,$,_,0..9" >
package_name =
identifier
/ ( package_name "." identifier )
and I try to parse a simple package like java.lang.String
I get either just java
as result or never return from recursion.
I tried it like this:
from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal
identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")
package_name = Forward()
definition = package_name+dot+identifier
package_name << Group(identifier+ZeroOrMore(definition))
package_name.parseString("java.lang.String")
will print [['java']]
from pyparsing import alphas,alphanums, Word, Forward, ZeroOrMore, Group, Literal
identifier=Word(alphas+"$_",alphanums+"$_")
dot=Literal(".")
package_name = Forward()
definition = identifier^package_name+dot+identifier
package_name << definition
package_name.parseString("java.lang.String")
will reach recursion limit
how does this Forward
placeholder work?
Upvotes: 6
Views: 2428
Reputation: 251365
The problem is not with Forward
but with your grammar, which is inherently either limited too early, or recursive in a way that is undecidable with a naive recursive descent parser like Pyparsing.
You have this:
package_name = identifier | (package_name "." identifier )
If you match left to right, this will always match a single identifier and then stop, without attempting to match a following period. If you switch the order to match the identifier
last:
package_name = (package_name "." identifier) | identifier
. . . then it will infinitely recurse, because in order to decide if package_name
matches, the first thing it has to do is decide whether package_name
matches. This is a left-recursive grammar, which a simple recursive-descent parser like Pyparsing can't handle. Pyparsing does not look ahead to see how a match will affect subsequent matches. It just tries the matches left to right.
You can get a simple example of how Forward
works by changing the way your grammar recurses:
identifier = pyp.Word(pyp.alphas+"$_", pyp.alphanums+"$_")
package_name = pyp.Forward()
package_name << ((identifier + '.' + package_name) | identifier)
>>> package_name.parseString("java.lang.String")
[u'java', u'.', u'lang', u'.', u'String'], {})
Here, the recursion happens on the right, not on the left, so Pyparsing can match it incremenetally.
(Your use of ZeroOrMore is a red herring. If you're going to have a recursive grammar like this, you don't want to use ZeroOrMore, because the recursive definition already allows your sub-expression to match multiple times. As I suggested in my comment, though, it is much simpler to define this sort of grammar without recursion anyway.)
Upvotes: 15