Reputation: 748
I wrote the following combined grammar:
grammar KeywordGrammar;
options{
TokenLabelType = MyToken;
}
//start rule
start: sequence+ EOF;
sequence: keyword filter?;
filter: simpleFilter | logicalFilter | rangeFilter;
logicalFilter: andFilter | orFilter | notFilter;
simpleFilter: lessFilter | greatFilter | equalFilter | containsFilter;
andFilter: simpleFilter AND? simpleFilter;
orFilter: simpleFilter OR simpleFilter;
lessFilter: LESS (DIGIT | FLOAT|DATE);
notFilter: NOT IN? (STRING|ID);
greatFilter: GREATER (DIGIT|FLOAT|DATE);
equalFilter: EQUAL (DIGIT|FLOAT|DATE);
containsFilter: EQUAL (STRING|ID);
rangeFilter: RANGE? DATE DATE? | RANGE? FLOAT FLOAT?;
keyword: ID | STRING;
DATE: DIGIT DIGIT? SEPARATOR MONTH SEPARATOR DIGIT DIGIT (DIGIT DIGIT)?;
MONTH: JAN
| FEV
| MAR
| APR
| MAY
| JUN
| JUL
| AUG
| SEP
| OCT
| NOV
| DEC
;
JAN : 'janeiro'|'jan'|'01'|'1';
FEV : 'fevereiro'|'fev'|'02'|'2';
MAR : 'março'|'mar'|'03'|'3';
APR : 'abril' |'abril'|'04'|'4';
MAY : 'maio'| 'mai'| '05'|'5';
JUN : 'junho'|'jun'|'06'|'6';
JUL : 'julho'|'jul'|'07'|'7';
AUG : 'agosto'|'ago'|'08'|'8';
SEP : 'setembro'|'set'|'09'|'9';
OCT : 'outubro'|'out'|'10';
NOV : 'novembro'|'nov'|'11';
DEC : 'dezembro'|'dez'|'12';
SEPARATOR: '/'|'-';
AND: ('e'|'E');
OR: ('O'|'o')('U'|'u');
NOT: ('N'|'n')('Ã'|'ã')('O'|'o');
IN: ('E'|'e')('M'|'m');
GREATER: '>' | ('m'|'M')('a'|'A')('i'|'I')('o'|'O')('r'|'R') ;
LESS: '<' | ('m'|'M')('e'|'E')('n'|'N')('o'|'O')('r'|'R');
EQUAL: '=' | ('i'|'I')('g'|'G')('u'|'U')('a'|'A')('l'|'L');
RANGE: ('e'|'E')('n'|'N')('t'|'T')('r'|'R')('e'|'E');
FLOAT: DIGIT+ | DIGIT+ POINT DIGIT+;
ID: (LETTER|DIGIT+ SYMBOL) (LETTER|SYMBOL|DIGIT)*;
STRING: '"' ( ESC_SEQ | ~('\\'|'"') )* '"';
DIGIT: [0-9];
WS: (' '
| '\t'
| '\r'
| '\n') -> skip
;
POINT: '.' | ',';
fragment
LETTER: 'A'..'Z'
| 'a'..'z'
| '\u00C0'..'\u00D6'
| '\u00D8'..'\u00F6'
| '\u00F8'..'\u02FF'
| '\u0370'..'\u037D'
| '\u037F'..'\u1FFF'
| '\u200C'..'\u200D'
| '\u2070'..'\u218F'
| '\u2C00'..'\u2FEF'
| '\u3001'..'\uD7FF'
| '\uF900'..'\uFDCF'
| '\uFDF0'..'\uFFFD'
;
fragment
SYMBOL: '-' | '_';
fragment
HEX_DIGIT: ('0'..'9'|'a'..'f'|'A'..'F');
fragment
ESC_SEQ: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT;
But a no viable alternative at input error occurs only trying parse a following type of sentences: keyword OPERATOR DIGIT; for example:
Zero as a value, it works!!!
Where is the error?
Thanks by your help,
Yenier
Upvotes: 1
Views: 11693
Reputation: 2698
You have a lot of ambiguity in your lexer rules. What messes it up specifically in your case is digits 1-9
can be matched to both DIGIT
and MONTH
, JAN
, etc. Digit 0
is immune to this problem. Use grun
with -tokens
to diagnose problems of the sort you encountered:
$ grun KeywordGrammar start -tokens
filter = 0
[@0,0:5='filter',<24>,1:0]
[@1,7:7='=',<21>,1:7]
[@2,9:9='0',<23>,1:9]
[@3,11:10='<EOF>',<-1>,2:0]
$ grun KeywordGrammar start -tokens
filter = 2
[@0,0:5='filter',<24>,1:0]
[@1,7:7='=',<21>,1:7]
[@2,9:9='2',<1>,1:9]
[@3,11:10='<EOF>',<-1>,2:0]
line 1:9 no viable alternative at input '=2'
As you can see, 0
in the first case hase token type <23>
, in the second case 2
is token type <1>
. Look at your generated KeywordGrammar.tokens
:
MONTH=1
JAN=2
...
FLOAT=23
...
So it is not a DIGIT
or FLOAT
- it is MONTH
. As a result, your filter
rule does not match. And yes, the order of rules matter, since in case of ambiguity ANTLR picks the first rule.
Remove the ambiguity from the lexer. Make months and similar tokens into grammar rules. And you have plenty of other places, like your FLOAT
makes DIGIT
impossible to appear standalone, still you refer to DIGIT
along with the FLOAT
in the rules. If DIGIT
has no significance at the grammar level, make it a fragment and use only FLOAT
in parser rules.
And make it a habit to use grun
and/or ANTLR plugins for IDE to make sure you know what your lexers and parsers actually see.
Upvotes: 2
Reputation: 748
testing here I saw that the problem disappears placing the FLOAT definition token before DATE definition.
...
FLOAT: DIGIT+ (POINT DIGIT+)?;
DATE: DIGIT DIGIT? SEPARATOR MONTH SEPARATOR DIGIT DIGIT (DIGIT DIGIT)?;
...
I do not know why. Does the order matter?
Upvotes: 0