Reputation: 10384
With the parsing library pyparsing
I want to analyse constructs like this:
123 456
-^- -^-
[A] B
where both parts A and B only contain numbers and part A is optional. Here some examples how a parser for this would break strings down in their parts:
123 456 ==> A="123", B="456"
456 ==> A="", B="456"
123 ==> A="", B="123"
1 123 ==> A="1", B="123"
The native approach to write a parser looks like this:
a = pp.Optional(pp.Word(pp.nums)).setName("PART_A")
b = pp.Word(pp.nums).setName("PART_B")
expr = a('A') + b('B')
This parser works for "123 456"
returning as expected {'A': '123', 'B': '456'}
. However it fails on "456"
with:
ParseException:
Expected PART_B (at char 3), (line:1, col:4)
"456>!<"
This is understandable because the optional part A already consumes the text which should match part B even though A was optional... My idea was to set a stopOn=
option, but it needs to stop on an expression of the same type as the expression it wants to match...
Update: My 2nd idea was to re-write the Optional
construct into a Or
construct:
a = pp.Word(pp.nums).setName("PART_A")('A')
b = pp.Word(pp.nums).setName("PART_B")('B')
just_b = b
a_and_b = a + b
expr = pp.Or(just_b, a_and_b)
However, this now fails for texts of the form "123 456"
- despite the fact that a_and_b
is a alternative in the Or
class...
Any suggestion what to do?
Upvotes: 1
Views: 31
Reputation: 63709
You are misconstructing the Or, it should be:
expr = pp.Or([just_b, a_and_b])
The way you are constructing it, the Or is being built with just just_b
, with a_and_b
being passed as the boolean argument savelist
.
Please consider using the operator overloads to construct And, Or, MatchFirst, and Each expressions.
integer = pp.Word(pp.nums)
a = integer("A")
b = integer("B")
expr = a + b | b
The explicit style looks just so, well, Java-ish.
To answer the question in your title, you pretty much have already solved this: be sure to try matching the full a_and_b
expression, either by placing it first in a MatchFirst (as my sample code does), or by using an Or expression (using the '^' operator, or by constructing an Or using a list of the just_b
and a_and_b
expressions).
Upvotes: 1