halloleo
halloleo

Reputation: 10384

How to make the first part of two syntactical equivalent parts optional when analysing with pyparsing

With the parsing library pyparsing I want to analyse constructs like this:

123 456

-^- -^-
[A]  B

where both parts A and B only contain numbers and part A is optional. Here some examples how a parser for this would break strings down in their parts:

123 456 ==> A="123", B="456"
456     ==> A="",    B="456"
123     ==> A="",    B="123"
1 123   ==> A="1",   B="123"

The native approach to write a parser looks like this:

a = pp.Optional(pp.Word(pp.nums)).setName("PART_A")
b = pp.Word(pp.nums).setName("PART_B")
expr = a('A')  + b('B')

This parser works for "123 456" returning as expected {'A': '123', 'B': '456'}. However it fails on "456" with:

ParseException:
Expected PART_B (at char 3), (line:1, col:4)
"456>!<"

This is understandable because the optional part A already consumes the text which should match part B even though A was optional... My idea was to set a stopOn= option, but it needs to stop on an expression of the same type as the expression it wants to match...

Update: My 2nd idea was to re-write the Optional construct into a Or construct:

a = pp.Word(pp.nums).setName("PART_A")('A')
b = pp.Word(pp.nums).setName("PART_B")('B')
just_b = b
a_and_b = a + b
expr = pp.Or(just_b, a_and_b)

However, this now fails for texts of the form "123 456" - despite the fact that a_and_b is a alternative in the Or class...

Any suggestion what to do?

Upvotes: 1

Views: 31

Answers (1)

PaulMcG
PaulMcG

Reputation: 63709

You are misconstructing the Or, it should be:

expr = pp.Or([just_b, a_and_b])

The way you are constructing it, the Or is being built with just just_b, with a_and_b being passed as the boolean argument savelist.

Please consider using the operator overloads to construct And, Or, MatchFirst, and Each expressions.

integer = pp.Word(pp.nums)

a = integer("A")
b = integer("B")

expr = a + b | b

The explicit style looks just so, well, Java-ish.

To answer the question in your title, you pretty much have already solved this: be sure to try matching the full a_and_b expression, either by placing it first in a MatchFirst (as my sample code does), or by using an Or expression (using the '^' operator, or by constructing an Or using a list of the just_b and a_and_b expressions).

Upvotes: 1

Related Questions