Reputation: 673
I'm using pyparsing with python 3.6.5 on a mac. The following code crashes on the second parse:
from pyparsing import *
a = Word(alphas) + Literal(';')
b = Word(alphas) + Optional(Literal(';'))
bad_parser = ZeroOrMore(a) + b
b.parseString('hello;')
print("no problems yet...")
bad_parser.parseString('hello;')
print("this will not print because we're dead")
Is this logical behavior? Or is it a bug?
EDIT: Here is the full console output:
no problems yet...
Traceback (most recent call last):
File "test.py", line 9, in <module>
bad_parser.parseString('hello;')
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1632, in parseString
raise exc
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1622, in parseString
loc, tokens = self._parse( instring, 0 )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1379, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 3395, in parseImpl
loc, exprtokens = e._parse( instring, loc, doActions )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1379, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 2689, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected W:(ABCD...) (at char 6), (line:1, col:7)
Upvotes: 2
Views: 350
Reputation: 63747
This is expected behavior. Pyparsing does not do any lookahead, but is purely left-to-right. You can add lookahead to your parser, but it is something you have to do for yourself.
You can get some more insight into what is happening if you turn on debugging for a
and b
:
a.setName('a').setDebug()
b.setName('b').setDebug()
which will show you every place pyparsing is about to match the expression, and then if the match failed or succeeded, and if it succeeded, the matching tokens:
Match a at loc 0(1,1)
Matched a -> ['hello', ';']
Match a at loc 6(1,7)
Exception raised:Expected W:(ABCD...) (at char 6), (line:1, col:7)
Match b at loc 6(1,7)
Exception raised:Expected W:(ABCD...) (at char 6), (line:1, col:7)
Since a
matches the complete input string, that matches the criterion of "zero or more". Then pyparsing proceeds to match b
, but since the word and semicolon have already been read, there is no more to parse. Since b
is not optional, pyparsing raises an exception that it could not be found. Even if you were to parse "hello; hello; hello;", all the strings and semis would be consumed by the
ZeroOrMore, with no trailing b
left to parse.
Try this:
not_so_bad_parser = ZeroOrMore(a + ~StringEnd()) + b
By stating that you only want to read a
expressions that are not at the end of the string, then parsing "hello;" will not match a
, and so proceed to b
, which then matches.
This is so prevalent an issue that I added the stopOn
keyword to the ZeroOrMore and OneOrMore class constructors, to avoid the need to add the overt ~
(meaning NotAny). At first I thought this might work:
even_less_bad_parser = ZeroOrMore(a, stopOn=b) + b
But then, since b
also matches as an a
, this will effectively never match any a
s, and may leave unmatched text. We need to stop on b
only if at the end of the string:
even_less_bad_parser = ZeroOrMore(a, stopOn=b + StringEnd()) + b
I'm not sure if that will truly satisfy your concept of "less bad"-ness, but that is why pyparsing is behaving as it is for you.
Upvotes: 4