Aaron Watters
Aaron Watters

Reputation: 2846

Trouble doing simple parse in pyparsing

I'm having some basic problem using pyparsing. Below is the test program and the output of the run.

aaron-mac:sql aaron$ more s.py

from pyparsing import *

n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")


aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
 File "s.py", line 15, in <module>
   print p.parseString("first,second.")
 File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
   raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$ 

How do I modify the grammar in the test program to parse a list of comma separated names terminated by a period? I looked in the docs and tried to find a live support list, but decided I was most likely to get a response here.

Upvotes: 6

Views: 2220

Answers (2)

PaulMcG
PaulMcG

Reputation: 63709

The '|' operator creates a MatchFirst expression, in which the alternatives are evaluated until there is a first match.

Pyparsing works purely left-to-right, applying parser expressions to the input string as it can. The only lookahead that pyparsing does is whatever you write into the parser.

In this expression:

a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))

Let's say n is just a literal "X". If this parser was given the input string "X", it would obviously match the leading, lone n expression. If given the string "X,X,X", it would still match just the leading n, because that is the first alternative in the parser.

If you turn the expression around to:

a = Group( Group( n + OneOrMore( Suppress(",") + n )) | n)

then to parse "X" it would first try to match the list, which will fail, and then match the lone n. To parse "X,X,X", the first alternative will be the list expression, which will match.

If you want the longest alternative to match, use the '^' operator, which gives an Or expression. Or will evaluate all the given alternatives, and then select the longest match.

a = Group( n ^ Group( n + OneOrMore( Suppress(",") + n )))

You can also simplify this using the pyparsing helper method delimitedList. Parsing lists of the same expression separated by commas is a common case, so rather than see people have to reinvent expr + ZeroOrMore(Suppress(",") + expr) over and over, I added delimitedList as a standard pyparsing helper. delimitedList("X") would match both "X" and "X,X,X".

Upvotes: 6

jcollado
jcollado

Reputation: 40374

If you just want to cover the case of a comma separated list of names terminated by period you can use the following:

from pyparsing import *
p = Word(alphanums)+ZeroOrMore(Suppress(",")+Word(alphanums))+Suppress(".")

With this you get the following results:

>>> print p.parseString("first.")
['first']
>>> print p.parseString("first,second.")
['first', 'second']

The other examples in your question fail because they don't end with a period.

Upvotes: 2

Related Questions