Reputation: 11
I'm designing a language that allows you to make predicates on data. Here is my lexer.
lexer grammar Studylexer;
fragment LETTER : [A-Za-z];
fragment DIGIT : [0-9];
fragment TWODIGIT : DIGIT DIGIT;
fragment MONTH: ('0' [1-9] | '1' [0-2]);
fragment DAY: ('0' [1-9] | '1' [1-9] | '2' [1-9] | '3' [0-1]);
TIMESTAMP: TWODIGIT ':' TWODIGIT; // représentation de la timestamp
DATE : TWODIGIT TWODIGIT MONTH DAY; // représentation de la date
ID : LETTER+; // match identifiers
STRING : '"' ( ~ '"' )* '"' ; // match string content
NEWLINE:'\r'? '\n' ; // return newlines to parser (is end-statement signal)
WS : [ \t]+ -> skip ; // toss out whitespace
LIST: ( LISTSTRING | LISTDATE | LISTTIMESTAMP ) ; // list of variabels;
// list of operators
GT: '>';
LT: '<';
GTEQ: '>=';
LTEQ:'<=';
EQ: '=';
IN: 'in';
fragment LISTSTRING: STRING ',' STRING (',' STRING)*; // list of strings
fragment LISTDATE : DATE ',' DATE (',' DATE)*; // list of dates
fragment LISTTIMESTAMP:TIMESTAMP ',' TIMESTAMP (',' TIMESTAMP )*; // list of timestamps
NAMES: 'filename' | 'timestamp' | 'tso' | 'region' | 'processType' | 'businessDate' | 'lastModificationDate'; // name of variables in the where block
KEY: ID '[' NAMES ']' | ID '.' NAMES; // predicat key
and here is a part of my grammar.
expr: KEY op = ('>' | '<') value = ( DATE | TIMESTAMP ) NEWLINE # exprGTORLT
| KEY op = ('>='| '<=') value = ( DATE | TIMESTAMP ) NEWLINE # exprGTEQORLTEQ
| KEY '=' value = ( STRING | DATE | TIMESTAMP ) NEWLINE # exprEQ
| KEY 'in' LIST NEWLINE #exprIn
When I make a predicate for example.
tab [key] in "value1", "value2"
ANTLR generates an error.
no viable alternative at input tab [key] in
What can I do to resolve this problem?
Upvotes: 0
Views: 1014
Reputation: 370415
First tab [key]
does not produce a KEY
token like you want it to for two reasons:
KEY
doesn't allow any spaces. The best way to fix that would be to remove the KEY
rule from your lexer and instead turn it into a parser rule (meaning you also need to turn [
and ]
into their own tokens). Then the white space in your input would be between tokens and thus successfully skipped.key
is not actually one of the words listed in NAMES
.Then another issue is that in
is recognized as an ID
token, not an IN
token. That's because both ID
and IN
would produce a match of the same length and in cases like that the rule that's listed first takes precedence. So you should define ID
after all of the keywords because otherwise the keywords will never be matched.
Upvotes: 0