Reputation: 2615
I am implementing a parser for a fairly complex grammar using PyParsing. (Which, if I may add, is really a pleasure to use!)
The grammar is somewhat 'dynamic' in that allows for the definition of (various) alphabets, which in turn define the elements allowed in other definitions. As an example:
alphabet: a b c
lists:
s1 = a b
s2 = b c x
Here, alphabet
is meant to define what elements are allowed in the lists
definitions. E.g., s1
would be valid, but s2
contains an invalid x
.
A simple PyParsing parser without that kind of validation could look like this:
from pyparsing import Literal, lineEnd, Word, alphanums,\
OneOrMore, Group, Suppress, dictOf
def fixedToken(literal):
return Suppress(Literal(literal))
Element = Word(alphanums)
Alphabet = Group(OneOrMore(~lineEnd + Element))
AlphaDef = fixedToken("alphabet:") + Alphabet
ListLine = OneOrMore(~lineEnd + Element)
Lists = dictOf(Word(alphanums) + fixedToken("="), ListLine)
Start = AlphaDef + fixedToken("lists:") + Lists
if __name__ == "__main__":
data = """
alphabet: a b c
lists:
s1 = a b
s2 = b c x
"""
res = Start.parseString(data)
for k, v in sorted(res.items()):
print k, "=", v
This will parse & give the output:
Alphabet= set(['a', 'c', 'b'])
s1 = ['a', 'b']
s2 = ['b', 'c', 'x']
However, I would like the parser to raise a ParseException (or similar) for s2
, since it contains the invalid x
. Ideally, I would like to be able to make the definition of ListLine
to say something like: OneOrMore(oneOf(Alphabet))
- but evidently, that would require some dynamic interpretation which can only be done once Alphabet
has actually been parsed & assembled.
One solution I found was to add parse actions to 1. remember the alphabet and 2. validate the lines:
# ...
Alphabet = Group(OneOrMore(~lineEnd + Element))
def alphaHold(toks):
alphaHold.alpha = set(*toks)
print "Alphabet=", alphaHold.alpha
Alphabet.addParseAction(alphaHold)
AlphaDef = fixedToken("alphabet:") + Alphabet
ListLine = OneOrMore(~lineEnd + Element)
def lineValidate(toks):
unknown = set(toks).difference(alphaHold.alpha)
if len(unknown):
msg= "Unknown element(s): {}".format(unknown)
print msg
raise ParseException(msg)
ListLine.addParseAction(lineValidate)
# ...
This gives almost the desired output:
Alphabet= set(['a', 'c', 'b'])
Unknown element(s): set(['x'])
s1 = ['a', 'b']
But unfortunately, PyParsing catches Exceptions thrown from parse actions, so this approach fails on a technicality. Is there another way to achieve this within PyParsing which I might have missed?
Upvotes: 3
Views: 677
Reputation: 63739
You are already pretty close to having this working. There are a number of cases where a pyparsing parser dynamically adjusts itself based on text that was previously parsed. The trick is to use a Forward
placeholder expression, and then insert the desired values into the placeholder as part of a parse action (very close to what you have in place now). Like this:
Element = Forward()
Alphabet = OneOrMore(~lineEnd + oneOf(list(alphas)))
def alphaHold(toks):
Element << oneOf(toks.asList())
Alphabet.setParseAction(alphaHold)
From here, I think the rest of your code works fairly well as-is. Actually, you won't even need the line validating function, as pyparsing will only match valid element names as elements using this method.
You might find that pyparsing's error reporting is a little fuzzy. You can get things to be a little better using '-' instead of '+' in some judicious places. Since pyparsing uses ParseExceptions for all of its internal signalling of expression matches/mismatches, it does not automatically recognize when you have gotten into a defined expression, but then have an invalid match on a contained expression. You can tell pyparsing to detect this using the '-' operator, like this:
ListDef = listName + '=' - OneOrMore(~lineEnd + Element)
Once pyparsing gets a name and an '=' sign, then any invalid Element found will immediately raise a ParseSyntaxException
, which will stop pyparsing's scan of the text at that point, and report the exception at the location of the invalid element.
Upvotes: 3