Reputation: 2885
I want to parse :
'APPLE BANANA FOO TEST BAR'
into :
[['APPLE BANANA'], 'FOO', ['TEST BAR']]
Here is my latest attempt:
to_parse = 'APPLE BANANA FOO TEST BAR'
words = Word(alphas)
foo = Keyword("FOO")
parser = Group(ZeroOrMore(words + ~foo)) + foo + Group(ZeroOrMore(words))
result = parser.parseString(to_parse)
But it will return the following error:
> raise ParseException(instring, loc, self.errmsg, self)
E pyparsing.ParseException: Expected "FOO" (at char 6), (line:1, col:7)
I think that the problem comes from ZeroOrMore(words + ~foo))
which is "too greedy". According to few questions on SO, the solution is to use that negation with ~foo
, but it doesn't work in this case. Any help would be appreciated
Upvotes: 1
Views: 616
Reputation: 63782
You are definitely on the right track. You just need to do the negative lookahead of foo
before parsing a words
:
parser = Group(ZeroOrMore(~foo + words)) + foo + Group(ZeroOrMore(words))
In recent pyparsing releases, I added a stopOn
argument to ZeroOrMore
and OneOrMore
that does the same thing, to make this less error-prone:
parser = Group(ZeroOrMore(words, stopOn=foo)) + foo + Group(ZeroOrMore(words))
With this change I get:
>>> result.asList()
[['APPLE', 'BANANA'], 'FOO', ['TEST', 'BAR']]
Upvotes: 1