How to parse this string with PyParsing?

Question

I want to parse :

'APPLE BANANA FOO TEST BAR'

into :

[['APPLE BANANA'], 'FOO', ['TEST BAR']]

Here is my latest attempt:

to_parse = 'APPLE BANANA FOO TEST BAR'
words = Word(alphas)
foo = Keyword("FOO")
parser = Group(ZeroOrMore(words + ~foo)) + foo + Group(ZeroOrMore(words))
result = parser.parseString(to_parse)

But it will return the following error:

>       raise ParseException(instring, loc, self.errmsg, self)
E       pyparsing.ParseException: Expected "FOO" (at char 6), (line:1, col:7)

I think that the problem comes from ZeroOrMore(words + ~foo)) which is "too greedy". According to few questions on SO, the solution is to use that negation with ~foo, but it doesn't work in this case. Any help would be appreciated

PaulMcG · Accepted Answer

You are definitely on the right track. You just need to do the negative lookahead of foo before parsing a words:

parser = Group(ZeroOrMore(~foo + words)) + foo + Group(ZeroOrMore(words))

In recent pyparsing releases, I added a stopOn argument to ZeroOrMore and OneOrMore that does the same thing, to make this less error-prone:

parser = Group(ZeroOrMore(words, stopOn=foo)) + foo + Group(ZeroOrMore(words))

With this change I get:

>>> result.asList()
[['APPLE', 'BANANA'], 'FOO', ['TEST', 'BAR']]

How to parse this string with PyParsing?

Answers (1)

Related Questions