Why is pyparsing truncating a parse rather than raising an exception

Question

I am working on a problem that revolves around specifying a wire protocol format using text strings. The basic idea is that you only put binary on the wire because it's a low bandwidth environment. But in order to do that both sides must agree ahead of time what means what so that they can correctly extract values from the wire.

In order to coordinate the "what means what" a configuration file is used. The core of it is that every packet body has definitions. Here are some examples:

abc:16
abc:15:p
abc:15:p; def:14:ecc

You specify a three or four letter identifier followed by a color, then the number of bits (which are generally a multiple of four but this isn't guaranteed) and optionally another colon followed by either a "p" meaning there is a bit devoted to parity or "ecc" meaning that there are two bits devoted to ecc.

So translating this to pyparsing this is what I've got:

from pyparsing import *

abbr = Word(alphas)
internal_separator = Literal(":")
bits = Word(nums, min=1, max=2)
parity = Or([CaselessLiteral("P"), CaselessLiteral("ECC")])
separator = Literal(";")

parity_part = internal_separator + parity
statement = Group(abbr + internal_separator + bits + Optional(parity_part))

#statement_list = delimitedList(statement, delim=";")
statement_list = statement + ZeroOrMore(separator + statement)

tests = ( 
    "abc:16",
    "abc:15:p", 
    "abc:15:p; def:14:ecc",
    "abc:17:p; def:q; ghi:21:", #this one should fail!
)

for t in tests:
    try:
        print t, "->", statement_list.parseString(t)
    except Exception as e:
        print e

When I run this, here is what I get:

abc:16 -> [['abc', ':', '16']]
abc:15:p -> [['abc', ':', '15', ':', 'P']]
abc:15:p; def:14:ecc -> [['abc', ':', '15', ':', 'P'], ';', ['def', ':', '14', ':', 'ECC']]
abc:17:p; def:q; ghi:21: -> [['abc', ':', '17', ':', 'P']]

What I can't understand is why pyparsing is simply truncating output on the last test. To me it seems like it should fail because its invalid. I also tried the delimitedList and I get the same behavior.

I've also tried the "abbr" with a min of 3 and max of 4 so that it recognizes for sure that the ":q" is invalid but that didn't change anything.

It seems like the error is getting swallowed up somehow and I don't really know why, nor do I know how to get that error to propagate up so that I can catch it.

I found this question (Trouble doing simple parse in pyparsing) which seems to be rather related but doesn't give me the answer I'm looking for either.

BlackJack · Accepted Answer

I can tell you why it is that way, but that's expected with pyparsing. You'll have to add an explicit StringEnd() to the grammar:

statement_list = statement + ZeroOrMore(separator + statement) + StringEnd()

Why is pyparsing truncating a parse rather than raising an exception

Answers (1)

Related Questions