Reputation: 17203
I'm a total lexer and parser newbie, so please have some patience. Eventually I want to be able to express LDAP style Query strings, e.g. '(foo=bar)', '(!foo=bar)', '(&(foo=bar)(!zip=zap))' and have a tree in the end which I could use to create the actual database query (or whatever)
So I thought to start with the simplest form, to parse expressions like (foo=bar) and (!foo=bar), but already I have some problems of understanding. I just want to express that the fields are separated from value by a '=', but ANTLR seems to eat all characters at once because the identifier looks a lot like a value. What do I have to do to prevent this?
grammar FilterExpression;
options
{
language=Java;
k=2;
}
tokens
{
NOT='!';
}
term : '(' NOT? FIELD '=' VALUE ')';
// lexer
FIELD : NAME;
VALUE : CDATA;
fragment NAME
: ALPHA+;
fragment CDATA
: ALPHA*;
fragment ALPHA
: ('a'..'z' | 'A'..'Z');
Upvotes: 1
Views: 769
Reputation: 12195
Okay, you are on the right track here. Just a few things you need to change. You will have to express the field name and field value in the parser rather than in the lexer, since the lexer has no way to tell the difference between these two. Having multiple Lexer expressions that use the same fragment makes it very hard (impossible!) for the Lexer to determine which one of these you want. Moving the determination of these two (name and value) to the parser makes it very easy. To make the value optional, just make that parser term optional (with the '?' behind it). See below for the parse tree produced with the modified grammar (hopefully this is what you were after). I have also pasted the modified grammar at the bottom of my answer for you.
alt text http://img268.imageshack.us/img268/7374/graphw.png
grammar FilterExpression;
options
{
language=Java;
k=2;
}
tokens
{
NOT='!';
}
term : '(' NOT? field '=' value? ')';
// lexer
field : ID;
value : ID;
ID : ALPHA+
;
fragment ALPHA
: ('a'..'z' | 'A'..'Z');
Upvotes: 2
Reputation: 1031
If fields and values are both identifiers, where an identifier is a non-empty string of alphabetic characters (allowing a value to be empty, as in your example), you could do something like:
term : '(' NOT? field '=' value ')';
field : IDENTIFIER ;
value : IDENTIFIER? ;
// lexer
IDENTIFIER : ALPHA+ ;
fragment ALPHA
: ('a'..'z' | 'A'..'Z');
Since the lexer can't tell a field from a value, you'd need to let the lexer treat them the same, and use the parser to tell the difference based on the context.
Upvotes: 0