Reputation: 8439
I just learnt about the excellent pyparsing module and I would like to use it to make a query parser.
Basically I would like to be able to parse the following kind of expression:
'b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)'
where b_coherent, symbol and nucleon are keywords of a database.
I read carefully one of the examples shipped with pyparsing (searchparser.py) that, I think (I hope !), led me quite close to my goal but there is still something wrong.
Here is my code:
from pyparsing import *
logical_operator = oneOf(['and','&','or','|'], caseless=True)
not_operator = oneOf(['not','^'], caseless=True)
db_keyword = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
value = Word(alphanums+'_')
quote = Combine('"' + value + '"') | value
selection = db_keyword + arithmetic_operator + (value|quote)
selection = selection + ZeroOrMore(logical_operator+selection)
parenthesis = Forward()
parenthesis << ((selection + parenthesis) | selection)
parenthesis = Combine('(' + parenthesis + ')') | selection
grammar = parenthesis + lineEnd
res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)')
I have some problem to fully understand the Forward object. Maybe that is one reason for my parser to not work properly. Would you have any idea of what is wrong with my grammar ?
thanks a lot for your help
Eric
Upvotes: 1
Views: 985
Reputation: 63762
You can use Forward to hand-craft your own expression nesting within parentheses, etc., but pyparsing's operatorPrecedence
simplifies this whole process. See my updated form of your original code below, with comments:
from pyparsing import *
# break these up so we can represent higher precedence for 'and' over 'or'
#~ logical_operator = oneOf(['and','&','or','|'], caseless=True)
not_operator = oneOf(['not','^'], caseless=True)
and_operator = oneOf(['and','&'], caseless=True)
or_operator = oneOf(['or' ,'|'], caseless=True)
# db_keyword is okay, but you might just want to use a general 'identifier' expression,
# you won't have to keep updating as you add other terms to your query language
db_keyword = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
ident = Word(alphas+'_', alphanums+'_')
# these aren't really arithmetic operators, they are comparison operators
#~ arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
# instead of generic 'value', define specific value types
#~ value = Word(alphanums+'_')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
# use pyparsing's QuotedString class for this, it gives you quote escaping, and
# automatically strips quotes from the parsed text
#~ quote = Combine('"' + value + '"') | value
quote = QuotedString('"')
# when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
literal_true = Keyword('true', caseless=True)
literal_false = Keyword('false', caseless=True)
boolean_literal = literal_true | literal_false
# in future, you can expand comparison_operand to be its own operatorPrecedence
# term, so that you can do things like "nucleon != 1+2" - but this is fine for now
comparison_operand = quote | db_keyword | ident | float_ | integer
comparison_expr = Group(comparison_operand + comparison_operator + comparison_operand)
# all this business is taken of for you by operatorPrecedence
#~ selection = db_keyword + arithmetic_operator + (value|quote)
#~ selection = selection + ZeroOrMore(logical_operator+selection)
#~ parenthesis = Forward()
#~ parenthesis << ((selection + parenthesis) | selection)
#~ parenthesis = Combine('(' + parenthesis + ')') | selection
#~ grammar = parenthesis + lineEnd
boolean_expr = operatorPrecedence(comparison_expr | boolean_literal,
[
(not_operator, 1, opAssoc.RIGHT),
(and_operator, 2, opAssoc.LEFT),
(or_operator, 2, opAssoc.LEFT),
])
grammar = boolean_expr
res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)', parseAll=True)
print res.asList()
prints
[[['b_coherent', '==', '1_2'], 'or', [['symbol', '==', 2], 'and', ['nucleon', '!=', 3]]]]
From here, I suggest you study how you take the next step to create something you can actually evaluate, check out the simpleBool.py example from the pyparsing wiki to see how this is done when using operatorPrecedence
.
I'm glad to hear you are enjoying pyparsing, welcome!
Upvotes: 1
Reputation: 13699
Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation. When the expression is known, it is assigned to the Forward variable using the '<<' operator.
Upvotes: 0