Reputation: 1365
I have a binary wire protocol configuration file that I am trying to parse. This is used to allow computers on both sides of a low bandwidth link to agree on which bits represent which data in a way that allows users to configure them in the field.
The config file strings look like this:
abc:16 => identifier abc has 16 bits
abc:16 def:12 => identifier abc has 16 bits and identifier def has 12
abc:16:p => identifier abc has 16 bits and a single parity bit
abc:16:ecc => identifier abc has 16 bits and two bits for ecc
I am to the point where I've got a grammar that I THINK should parse this correctly but I'm running into a strange problem: I can only have an identifier without parity or ecc as the last statement on a line. The grammar SHOULD support having identifiers with or without parity anywhere on the line, but for whatever reason that doesn't happen.
So:
abc:16
by itself is OK since there is nothing after it
abc:16:p def:12
is OK because the abc:16:p has a parity on the end
abc:16 def:12
is NOT OK because the abc:16 doesn't have parity AND it's not at the end, but this should be fine
abc:16 def:12:p
also is NOT OK because the non-parity statement isn't at the end, but this also should be perfectly OK
Here is the program:
from pyparsing import *
import re
abbr = Word(alphas, min=3, max=4)
#abbr = abbr.setDebug()
separator = Suppress(Literal(":"))
bits = Word(nums, min=1, max=2)
parity = Or([CaselessLiteral("P"), CaselessLiteral("ECC")])
bits_part = separator + bits
#bits_part = bits_part.setDebug()
parity_part = separator + parity
#parity_part = parity_part.setDebug()
statement = abbr + bits_part + Optional(parity_part)
#statement = statement.setDebug()
statement_list = StringStart() + statement + ZeroOrMore(Suppress(White()) + statement) + Optional(Suppress(White())) + StringEnd()
tests = (
"abc:16",
"abc:15:p",
"abc:15:p def:14:ecc",
"abc:17:p def:q ghi:21:", #this one should fail since "q" isn't parity and you shouldn't have a trailing colon with no parity after it
"abc:16:p def:12", #this passes so it's OK to have a trailing statement without parity
"abc:15 def:12:p", #this fails but shouldn't
"abc:16:p def:12 pqr:11", #this is also failing because anything but the last statement missing parity causes failure, but I don't think that's the right behavior
)
for t in tests:
try:
print t
print statement_list.parseString(t)
except Exception as e:
print e
When I run it without debugging turned on I see the following results. According to my understand (and the comments above) only the third example should fail since it has the "q" where the "p" for parity should be. Everything else should pass, but raises an exception for a reason I don't understand.
abc:16
['abc', '16']
abc:15:p
['abc', '15', 'P']
abc:15:p def:14:ecc
['abc', '15', 'P', 'def', '14', 'ECC']
abc:17:p def:q ghi:21:
Expected end of text (at char 9), (line:1, col:10)
abc:16:p def:12
['abc', '16', 'P', 'def', '12']
abc:15 def:12:p
Expected end of text (at char 7), (line:1, col:8)
abc:16:p def:12 pqr:11
Expected end of text (at char 16), (line:1, col:17)
When I turn debugging on (it's all commented out in the above example code) and I just look at the "abc:16 def:12" this is the output:
abc:15 def:12:p
Match {W:(abcd...) {Suppress:(":") W:(0123...)} [{Suppress:(":") {'P' ^ 'ECC'}}]} at loc 0(1,1)
Match W:(abcd...) at loc 0(1,1)
Matched W:(abcd...) -> ['abc']
Match {Suppress:(":") W:(0123...)} at loc 3(1,4)
Matched {Suppress:(":") W:(0123...)} -> ['15']
Match {Suppress:(":") {'P' ^ 'ECC'}} at loc 7(1,8)
Exception raised:Expected ":" (at char 7), (line:1, col:8)
Matched {W:(abcd...) {Suppress:(":") W:(0123...)} [{Suppress:(":") {'P' ^ 'ECC'}}]} -> ['abc', '15']
Expected end of text (at char 7), (line:1, col:8)
In my mind that confirms that it's trying to match the parity_part, which obviously isn't there. But I've got it set so that the parity_part is Optional() so I can't figure out why it's insisting on finding it.
Furthermore there is a whitespace char there (between abc:16 and def:12) which I feel should be triggering it to move on, the way I have specified in the statement_list portion of the grammar. To that end I have also tacked on a "leaveWhitespace()" call to the exerciser at the end:
print statement_list.parseString(t).leaveWhitespace()
But that didn't change anything (in that it didn't start parsing the way I would expect) so I don't believe that the problem is that it's missing the whitespace. I can't discount it entirely of course.
I am getting pretty perplexed here because I've tackled this from every angle that I can think of and I still don't get what I would expect. Am I specifying the grammar wrong? Is pyparsing doing something wrong? I feel pretty confident that I've made the mistake somewhere but I really can't see it.
EDIT:
So Paul has pointed out that I've got a bunch of dumb whitespace stuff everywhere and that when he trashed all that and simplified things worked fine. The whitespace stuff was there on purpose because I was going to try and prevent people from doing something like:
"abc : 10 : ecc"
because it looks bad, not because it doesn't contain the right information.
I am not sure that it's worth it to me to prevent people from putting spaces where I think they shouldn't so Paul's answer is probably good enough for me to move on with my life.
But I am still curious why the version I cooked up didn't work and the modifications he made did. They look functionally equivalent to me.
Upvotes: 2
Views: 459
Reputation: 63729
You do know that pyparsing will skip over whitespace on its own, yes?
I get this to work by defining statement_list as just plain:
statement_list = OneOrMore(statement)
To keep the multiple statements from running together, you should use Group:
statement_list = OneOrMore(Group(statement))
And instead of adding your own StringEnd to force the parser to try to process the full string, use parseAll=True
:
print statement_list.parseString(t, parseAll=True)
Upvotes: 2