Eurydice
Eurydice

Reputation: 8439

using pyparsing for a query parser

I just learnt about the excellent pyparsing module and I would like to use it to make a query parser.

Basically I would like to be able to parse the following kind of expression:

'b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)'

where b_coherent, symbol and nucleon are keywords of a database.

I read carefully one of the examples shipped with pyparsing (searchparser.py) that, I think (I hope !), led me quite close to my goal but there is still something wrong.

Here is my code:

from pyparsing import *

logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])

value = Word(alphanums+'_')
quote = Combine('"' + value + '"') | value

selection = db_keyword + arithmetic_operator + (value|quote)
selection = selection + ZeroOrMore(logical_operator+selection)

parenthesis = Forward()
parenthesis << ((selection + parenthesis) | selection)
parenthesis = Combine('(' + parenthesis + ')') | selection

grammar = parenthesis + lineEnd

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)')

I have some problem to fully understand the Forward object. Maybe that is one reason for my parser to not work properly. Would you have any idea of what is wrong with my grammar ?

thanks a lot for your help

Eric

Upvotes: 1

Views: 985

Answers (2)

PaulMcG
PaulMcG

Reputation: 63762

You can use Forward to hand-craft your own expression nesting within parentheses, etc., but pyparsing's operatorPrecedence simplifies this whole process. See my updated form of your original code below, with comments:

from pyparsing import *

# break these up so we can represent higher precedence for 'and' over 'or'
#~ logical_operator    = oneOf(['and','&','or','|'], caseless=True) 
not_operator        = oneOf(['not','^'], caseless=True) 
and_operator        = oneOf(['and','&'], caseless=True) 
or_operator         = oneOf(['or' ,'|'], caseless=True) 

# db_keyword is okay, but you might just want to use a general 'identifier' expression,
# you won't have to keep updating as you add other terms to your query language
db_keyword          = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
ident = Word(alphas+'_', alphanums+'_')

# these aren't really arithmetic operators, they are comparison operators
#~ arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])

# instead of generic 'value', define specific value types 
#~ value = Word(alphanums+'_')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

# use pyparsing's QuotedString class for this, it gives you quote escaping, and
# automatically strips quotes from the parsed text
#~ quote = Combine('"' + value + '"') | value
quote = QuotedString('"')

# when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
literal_true = Keyword('true', caseless=True)
literal_false = Keyword('false', caseless=True)
boolean_literal = literal_true | literal_false

# in future, you can expand comparison_operand to be its own operatorPrecedence 
# term, so that you can do things like "nucleon != 1+2" - but this is fine for now
comparison_operand = quote | db_keyword | ident | float_ | integer
comparison_expr = Group(comparison_operand + comparison_operator + comparison_operand)

# all this business is taken of for you by operatorPrecedence
#~ selection = db_keyword + arithmetic_operator + (value|quote)
#~ selection = selection + ZeroOrMore(logical_operator+selection)
#~ parenthesis = Forward()
#~ parenthesis << ((selection + parenthesis) | selection)
#~ parenthesis = Combine('(' + parenthesis + ')') | selection
#~ grammar = parenthesis + lineEnd

boolean_expr = operatorPrecedence(comparison_expr | boolean_literal, 
    [
    (not_operator, 1, opAssoc.RIGHT),
    (and_operator, 2, opAssoc.LEFT),
    (or_operator,  2, opAssoc.LEFT),
    ])
grammar = boolean_expr

res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)', parseAll=True)

print res.asList()

prints

[[['b_coherent', '==', '1_2'], 'or', [['symbol', '==', 2], 'and', ['nucleon', '!=', 3]]]]

From here, I suggest you study how you take the next step to create something you can actually evaluate, check out the simpleBool.py example from the pyparsing wiki to see how this is done when using operatorPrecedence.

I'm glad to hear you are enjoying pyparsing, welcome!

Upvotes: 1

John
John

Reputation: 13699

Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation. When the expression is known, it is assigned to the Forward variable using the '<<' operator.

Upvotes: 0

Related Questions