Mauli
Mauli

Reputation: 17203

ANTLR (field=value), how to express this?

I'm a total lexer and parser newbie, so please have some patience. Eventually I want to be able to express LDAP style Query strings, e.g. '(foo=bar)', '(!foo=bar)', '(&(foo=bar)(!zip=zap))' and have a tree in the end which I could use to create the actual database query (or whatever)

So I thought to start with the simplest form, to parse expressions like (foo=bar) and (!foo=bar), but already I have some problems of understanding. I just want to express that the fields are separated from value by a '=', but ANTLR seems to eat all characters at once because the identifier looks a lot like a value. What do I have to do to prevent this?

grammar FilterExpression;

options
{
    language=Java;
    k=2;
}

tokens
{
    NOT='!';
}

term    :   '(' NOT? FIELD '=' VALUE ')';
// lexer
FIELD   :   NAME;
VALUE   :   CDATA;

fragment NAME
    :   ALPHA+;
fragment CDATA
    :   ALPHA*;
fragment ALPHA
    :   ('a'..'z' | 'A'..'Z');

Upvotes: 1

Views: 769

Answers (2)

a_m0d
a_m0d

Reputation: 12195

Okay, you are on the right track here. Just a few things you need to change. You will have to express the field name and field value in the parser rather than in the lexer, since the lexer has no way to tell the difference between these two. Having multiple Lexer expressions that use the same fragment makes it very hard (impossible!) for the Lexer to determine which one of these you want. Moving the determination of these two (name and value) to the parser makes it very easy. To make the value optional, just make that parser term optional (with the '?' behind it). See below for the parse tree produced with the modified grammar (hopefully this is what you were after). I have also pasted the modified grammar at the bottom of my answer for you.
alt text http://img268.imageshack.us/img268/7374/graphw.png

grammar FilterExpression;

options
{
    language=Java;
    k=2;
}

tokens
{
    NOT='!';
}

term    :       '(' NOT? field '=' value? ')';
// lexer
field   :       ID;
value   :       ID;

ID  :   ALPHA+
    ;

fragment ALPHA
    :   ('a'..'z' | 'A'..'Z');

Upvotes: 2

Sam Martin
Sam Martin

Reputation: 1031

If fields and values are both identifiers, where an identifier is a non-empty string of alphabetic characters (allowing a value to be empty, as in your example), you could do something like:

term    :       '(' NOT? field '=' value ')';

field : IDENTIFIER ;

value : IDENTIFIER? ;

// lexer
IDENTIFIER : ALPHA+ ;

fragment ALPHA
    :   ('a'..'z' | 'A'..'Z');

Since the lexer can't tell a field from a value, you'd need to let the lexer treat them the same, and use the parser to tell the difference based on the context.

Upvotes: 0

Related Questions