Yenier Torres
Yenier Torres

Reputation: 748

Why occurs 'no viable alternative at input'?

I wrote the following combined grammar:

grammar KeywordGrammar;

options{
TokenLabelType = MyToken;
}

//start rule
start: sequence+ EOF;

sequence: keyword filter?;

filter: simpleFilter | logicalFilter | rangeFilter;

logicalFilter: andFilter | orFilter | notFilter;

simpleFilter: lessFilter | greatFilter | equalFilter | containsFilter;

andFilter: simpleFilter AND? simpleFilter;

orFilter: simpleFilter OR simpleFilter;

lessFilter: LESS (DIGIT | FLOAT|DATE);

notFilter: NOT IN? (STRING|ID);

greatFilter: GREATER (DIGIT|FLOAT|DATE);

equalFilter: EQUAL (DIGIT|FLOAT|DATE);

containsFilter: EQUAL (STRING|ID);

rangeFilter:  RANGE? DATE DATE? | RANGE? FLOAT FLOAT?; 

keyword: ID | STRING;

DATE: DIGIT DIGIT? SEPARATOR MONTH SEPARATOR DIGIT DIGIT (DIGIT DIGIT)?;

MONTH: JAN
     | FEV 
     | MAR
     | APR
     | MAY
     | JUN
     | JUL
     | AUG
     | SEP
     | OCT
     | NOV
     | DEC
     ;

JAN : 'janeiro'|'jan'|'01'|'1';
FEV : 'fevereiro'|'fev'|'02'|'2';
MAR : 'março'|'mar'|'03'|'3';
APR : 'abril' |'abril'|'04'|'4';
MAY : 'maio'| 'mai'| '05'|'5';
JUN : 'junho'|'jun'|'06'|'6';
JUL : 'julho'|'jul'|'07'|'7';
AUG : 'agosto'|'ago'|'08'|'8';
SEP : 'setembro'|'set'|'09'|'9';
OCT : 'outubro'|'out'|'10';
NOV : 'novembro'|'nov'|'11';
DEC : 'dezembro'|'dez'|'12';

SEPARATOR: '/'|'-';

AND: ('e'|'E');

OR: ('O'|'o')('U'|'u');

NOT: ('N'|'n')('Ã'|'ã')('O'|'o');

IN: ('E'|'e')('M'|'m');

GREATER: '>' | ('m'|'M')('a'|'A')('i'|'I')('o'|'O')('r'|'R') ;

LESS: '<' | ('m'|'M')('e'|'E')('n'|'N')('o'|'O')('r'|'R');

EQUAL: '=' | ('i'|'I')('g'|'G')('u'|'U')('a'|'A')('l'|'L');

RANGE: ('e'|'E')('n'|'N')('t'|'T')('r'|'R')('e'|'E');

FLOAT: DIGIT+ | DIGIT+ POINT DIGIT+;

ID: (LETTER|DIGIT+ SYMBOL) (LETTER|SYMBOL|DIGIT)*;

STRING: '"' ( ESC_SEQ | ~('\\'|'"') )* '"';

DIGIT: [0-9];

WS: (' '
  |  '\t'
  |  '\r'
  |  '\n') -> skip
  ;

POINT: '.' | ',';

fragment 
LETTER: 'A'..'Z'
      | 'a'..'z'
      | '\u00C0'..'\u00D6'
      | '\u00D8'..'\u00F6'
      | '\u00F8'..'\u02FF'
      | '\u0370'..'\u037D'
      | '\u037F'..'\u1FFF'
      | '\u200C'..'\u200D'
      | '\u2070'..'\u218F'
      | '\u2C00'..'\u2FEF'
      | '\u3001'..'\uD7FF'
      | '\uF900'..'\uFDCF'
      | '\uFDF0'..'\uFFFD'
      ;

fragment 
SYMBOL: '-' | '_';

fragment
HEX_DIGIT: ('0'..'9'|'a'..'f'|'A'..'F');

fragment
ESC_SEQ: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
       | UNICODE_ESC
       | OCTAL_ESC
       ;

fragment
OCTAL_ESC: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
         | '\\' ('0'..'7') ('0'..'7')
         | '\\' ('0'..'7')
         ;

fragment
UNICODE_ESC: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT;

But a no viable alternative at input error occurs only trying parse a following type of sentences: keyword OPERATOR DIGIT; for example:

Zero as a value, it works!!!

Where is the error?

Thanks by your help,

Yenier

Upvotes: 1

Views: 11693

Answers (2)

Yuri Steinschreiber
Yuri Steinschreiber

Reputation: 2698

You have a lot of ambiguity in your lexer rules. What messes it up specifically in your case is digits 1-9 can be matched to both DIGIT and MONTH, JAN, etc. Digit 0 is immune to this problem. Use grun with -tokens to diagnose problems of the sort you encountered:

$ grun KeywordGrammar start -tokens
filter = 0
[@0,0:5='filter',<24>,1:0]
[@1,7:7='=',<21>,1:7]
[@2,9:9='0',<23>,1:9]
[@3,11:10='<EOF>',<-1>,2:0]

$ grun KeywordGrammar start -tokens
filter = 2
[@0,0:5='filter',<24>,1:0]
[@1,7:7='=',<21>,1:7]
[@2,9:9='2',<1>,1:9]
[@3,11:10='<EOF>',<-1>,2:0]
line 1:9 no viable alternative at input '=2'

As you can see, 0 in the first case hase token type <23>, in the second case 2 is token type <1>. Look at your generated KeywordGrammar.tokens:

MONTH=1
JAN=2
...
FLOAT=23
...

So it is not a DIGIT or FLOAT - it is MONTH. As a result, your filter rule does not match. And yes, the order of rules matter, since in case of ambiguity ANTLR picks the first rule.

Remove the ambiguity from the lexer. Make months and similar tokens into grammar rules. And you have plenty of other places, like your FLOAT makes DIGIT impossible to appear standalone, still you refer to DIGIT along with the FLOAT in the rules. If DIGIT has no significance at the grammar level, make it a fragment and use only FLOAT in parser rules.

And make it a habit to use grun and/or ANTLR plugins for IDE to make sure you know what your lexers and parsers actually see.

Upvotes: 2

Yenier Torres
Yenier Torres

Reputation: 748

testing here I saw that the problem disappears placing the FLOAT definition token before DATE definition.

...
FLOAT: DIGIT+ (POINT DIGIT+)?;

DATE: DIGIT DIGIT? SEPARATOR MONTH SEPARATOR DIGIT DIGIT (DIGIT DIGIT)?;
...

I do not know why. Does the order matter?

Upvotes: 0

Related Questions