lesnar_56
lesnar_56

Reputation: 105

pyparsing error

I am stuck at this error in pyparsing

from pyparsing import Word,alphas,nums,Or,Regex,StringEnd
ws = Regex('\s*')
dot = "."
w = Word(alphas) + (ws | dot) + StringEnd()
w.leaveWhitespace()
w.parseString('AMIT.')

Returns the following error:

ParseException: Expected end of text (at char 4), (line:1, col:5)

Upvotes: 6

Views: 6003

Answers (1)

PaulMcG
PaulMcG

Reputation: 63709

| creates a "match first" expression, not "match longest".

The first alternative is the regex, which will match 0 or more whitespace characters. This, in fact, does match, so the dot is not parsed.

Then the next element to parse is StringEnd, but the parse position is still located at the '.'—so, fail!

Here is some more detailed output by adding setDebug() calls to your grammar expressions:

>>> w = Word(alphas).setDebug() + (ws.setDebug() | dot.setDebug()) + StringEnd()
>>> w.parseString('AMIT.')
Match W:(abcd...) at loc 0(1,1)
Matched W:(abcd...) -> ['AMIT']
Match Re:('\\s*') at loc 4(1,5)
Matched Re:('\\s*') -> ['']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python26\lib\site-packages\pyparsing-1.5.6-py2.6.egg\pyparsing.py", line 1032, in parseString
    raise exc
pyparsing.ParseException: Expected end of text (at char 4), (line:1, col:5)

To get your grammar to work you could:

  • change the | operator to ^ (match longest instead of match first)

  • change the regex to \s+ instead of \s* (so that at least one space was required for a match)

  • change your second term to Optional(dot)

In general, explicit testing for whitespace is not consistent with the pyparsing philosophy—pyparsing is not the same as re.

Upvotes: 7

Related Questions