Reputation: 9178
I have expressions like follows:
eg 1: (f1 AND f2)
eg 2: ((f1 OR f2) AND f3)
eg 3: ((f1 OR f2) AND (f3 OR (f4 AND f5)))
Each of f(n) is used to generate a fragment of SQL and each of these fragments will be joined using OR / AND described in the expression.
Now I want to :
1) Parse this expression
2) Validate it
3) Generate "Expression Tree" for the expression and use this tree to generate the final SQL.
I found this series of articles on writing tokenizers, parsers..etc ex :
http://cogitolearning.co.uk/2013/05/writing-a-parser-in-java-the-expression-tree/
I also came across with the library ANTLR , which wondering whether I can use for my case.
Any tips?
Upvotes: 0
Views: 330
Reputation: 6576
I'm guessing you might only interested in Java (it would be good to say so in future), but if you have a choice of languages, then I would recommend using Python and parsy for a task like this. It is much more light weight than things like ANTLR.
Here is some example code I knocked together that parses your samples into appropriate data structures:
import attr
from parsy import string, regex, generate
@attr.s
class Variable():
name = attr.ib()
@attr.s
class Compound():
left_value = attr.ib()
right_value = attr.ib()
operator = attr.ib()
@attr.s
class Expression():
value = attr.ib()
# You could put an `evaluate` method here,
# or `generate_sql` etc.
whitespace = regex(r'\s*')
lexeme = lambda p: whitespace >> p << whitespace
AND = lexeme(string('AND'))
OR = lexeme(string('OR'))
OPERATOR = AND | OR
LPAREN = lexeme(string('('))
RPAREN = lexeme(string(')'))
variable = lexeme((AND | OR | LPAREN | RPAREN).should_fail("not AND OR ( )") >> regex("\w+")).map(Variable)
@generate
def compound():
yield LPAREN
left = yield variable | compound
op = yield OPERATOR
right = yield variable | compound
yield RPAREN
return Compound(left_value=left,
right_value=right,
operator=op)
expression = (variable | compound).map(Expression)
I'm also use attrs for simple data structures.
The result of parsing is a hierarchy of expressions:
>>> expression.parse("((f1 OR f2) AND (f3 OR (f4 AND f5)))")
Expression(value=Compound(left_value=Compound(left_value=Variable(name='f1'), right_value=Variable(name='f2'), operator='OR'), right_value=Compound(left_value=Variable(name='f3'), right_value=Compound(left_value=Variable(name='f4'), right_value=Variable(name='f5'), operator='AND'), operator='OR'), operator='AND'))
Upvotes: 2