Ashika Umanga Umagiliya
Ashika Umanga Umagiliya

Reputation: 9178

Writing Custom Expression parser or using ANTLR library?

I have expressions like follows:

eg 1: (f1 AND f2)

eg 2: ((f1 OR f2) AND f3)

eg 3: ((f1 OR f2) AND (f3 OR (f4 AND f5)))

Each of f(n) is used to generate a fragment of SQL and each of these fragments will be joined using OR / AND described in the expression.

Now I want to :

1) Parse this expression

2) Validate it

3) Generate "Expression Tree" for the expression and use this tree to generate the final SQL.

I found this series of articles on writing tokenizers, parsers..etc ex :

http://cogitolearning.co.uk/2013/05/writing-a-parser-in-java-the-expression-tree/

I also came across with the library ANTLR , which wondering whether I can use for my case.

Any tips?

Upvotes: 0

Views: 330

Answers (1)

spookylukey
spookylukey

Reputation: 6576

I'm guessing you might only interested in Java (it would be good to say so in future), but if you have a choice of languages, then I would recommend using Python and parsy for a task like this. It is much more light weight than things like ANTLR.

Here is some example code I knocked together that parses your samples into appropriate data structures:

import attr
from parsy import string, regex, generate


@attr.s
class Variable():
    name = attr.ib()


@attr.s
class Compound():
    left_value = attr.ib()
    right_value = attr.ib()
    operator = attr.ib()


@attr.s
class Expression():
    value = attr.ib()
    # You could put an `evaluate` method here,
    # or `generate_sql` etc.


whitespace = regex(r'\s*')
lexeme = lambda p: whitespace >> p << whitespace


AND = lexeme(string('AND'))
OR = lexeme(string('OR'))
OPERATOR = AND | OR
LPAREN = lexeme(string('('))
RPAREN = lexeme(string(')'))
variable = lexeme((AND | OR | LPAREN | RPAREN).should_fail("not AND OR ( )") >> regex("\w+")).map(Variable)


@generate
def compound():
    yield LPAREN
    left = yield variable | compound
    op = yield OPERATOR
    right = yield variable | compound
    yield RPAREN

    return Compound(left_value=left,
                    right_value=right,
                    operator=op)


expression = (variable | compound).map(Expression)

I'm also use attrs for simple data structures.

The result of parsing is a hierarchy of expressions:

>>> expression.parse("((f1 OR f2) AND (f3 OR (f4 AND f5)))")
Expression(value=Compound(left_value=Compound(left_value=Variable(name='f1'), right_value=Variable(name='f2'), operator='OR'), right_value=Compound(left_value=Variable(name='f3'), right_value=Compound(left_value=Variable(name='f4'), right_value=Variable(name='f5'), operator='AND'), operator='OR'), operator='AND'))

Upvotes: 2

Related Questions