Labrador
Labrador

Reputation: 673

apparent pyparsing bug with 'ZeroOrMore'

I'm using pyparsing with python 3.6.5 on a mac. The following code crashes on the second parse:

from pyparsing import *

a = Word(alphas) + Literal(';')
b = Word(alphas) + Optional(Literal(';'))
bad_parser = ZeroOrMore(a) + b

b.parseString('hello;')
print("no problems yet...")
bad_parser.parseString('hello;')
print("this will not print because we're dead")

Is this logical behavior? Or is it a bug?


EDIT: Here is the full console output:

no problems yet...
Traceback (most recent call last):
  File "test.py", line 9, in <module>
    bad_parser.parseString('hello;')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1632, in parseString
    raise exc
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1622, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1379, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 3395, in parseImpl
    loc, exprtokens = e._parse( instring, loc, doActions )
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 1379, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyparsing.py", line 2689, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected W:(ABCD...) (at char 6), (line:1, col:7)

Upvotes: 2

Views: 350

Answers (1)

PaulMcG
PaulMcG

Reputation: 63747

This is expected behavior. Pyparsing does not do any lookahead, but is purely left-to-right. You can add lookahead to your parser, but it is something you have to do for yourself.

You can get some more insight into what is happening if you turn on debugging for a and b:

a.setName('a').setDebug()
b.setName('b').setDebug()

which will show you every place pyparsing is about to match the expression, and then if the match failed or succeeded, and if it succeeded, the matching tokens:

Match a at loc 0(1,1)
Matched a -> ['hello', ';']
Match a at loc 6(1,7)
Exception raised:Expected W:(ABCD...) (at char 6), (line:1, col:7)
Match b at loc 6(1,7)
Exception raised:Expected W:(ABCD...) (at char 6), (line:1, col:7)

Since a matches the complete input string, that matches the criterion of "zero or more". Then pyparsing proceeds to match b, but since the word and semicolon have already been read, there is no more to parse. Since b is not optional, pyparsing raises an exception that it could not be found. Even if you were to parse "hello; hello; hello;", all the strings and semis would be consumed by the ZeroOrMore, with no trailing b left to parse.

Try this:

not_so_bad_parser = ZeroOrMore(a + ~StringEnd()) + b

By stating that you only want to read a expressions that are not at the end of the string, then parsing "hello;" will not match a, and so proceed to b, which then matches.

This is so prevalent an issue that I added the stopOn keyword to the ZeroOrMore and OneOrMore class constructors, to avoid the need to add the overt ~ (meaning NotAny). At first I thought this might work:

even_less_bad_parser = ZeroOrMore(a, stopOn=b) + b

But then, since b also matches as an a, this will effectively never match any as, and may leave unmatched text. We need to stop on b only if at the end of the string:

even_less_bad_parser = ZeroOrMore(a, stopOn=b + StringEnd()) + b

I'm not sure if that will truly satisfy your concept of "less bad"-ness, but that is why pyparsing is behaving as it is for you.

Upvotes: 4

Related Questions