Skrodde
Skrodde

Reputation: 41

Generated Antlr Parser in Java: Not all inputs are read

I am working on my Antlr grammar to parse polynomial functions in multiple variables using Java. Examples for legal input are

42; X; +42X; Y^42; 1337HelloWorld; 13,37X^42; 

The following grammar does compile without warnings or errors:

grammar Function;

parseFunction returns [java.util.List<java.util.List<Object>> list] :   
    { list = new java.util.ArrayList(); }                                              ( f=functionPart { list.add($f.list); } )+
|   { list = new java.util.ArrayList(); } ( fb=functionBegin ) { list.add($fb.list); } ( f=functionPart { list.add($f.list); } )*
;

functionBegin returns [java.util.List<Object> list]:
m=NUMBER v=VARIABLE e=exponent  { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); list.add($e.value); }
| m=NUMBER v=VARIABLE           { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); }
| v=VARIABLE e=exponent         { list = new java.util.ArrayList(); list.add("+"); list.add("1");     list.add($v.text); list.add($e.value); }  
| v=VARIABLE                    { list = new java.util.ArrayList(); list.add("+"); list.add("1");     list.add($v.text); }
| m=NUMBER                      { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); }
;

functionPart returns [java.util.List<Object> list] :    
s=SIGN m=NUMBER v=VARIABLE e=exponent   { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); list.add($e.value); }
| s=SIGN m=NUMBER v=VARIABLE            { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); }
| s=SIGN v=VARIABLE e=exponent          { list = new java.util.ArrayList(); list.add($s.text); list.add("1");     list.add($v.text); list.add($e.value); }
| s=SIGN v=VARIABLE                     { list = new java.util.ArrayList(); list.add($s.text); list.add("1");     list.add($v.text); }
| s=SIGN m=NUMBER                       { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); }
;

exponent returns [int value]: ('^' n=INTEGER) { $value = 1; if ( $n != null && $n.text.length() > 0) $value = Integer.parseInt($n.text); }
;

VARIABLE    : ('a'..'z'|'A'..'Z')+
;

INTEGER : ('0'..'9')+
;

NUMBER  : ('0'..'9')+ (','('0'..'9')+)?
;

SIGN    :   ('+'|'-')
;

WS  :    (' ' | '\t' | '\r'| '\n')+ {skip();} 
;

This grammar, if compiled and used in Java does accept most input values. Apparently, not all valid input values are accepted. As soon as a number not using a comma pops up, like the inputs

+42; 42; 42X^1337; 

an error is thrown (error from input "+42"):

line 1:1 no viable alternative at input '+'

The error is not thrown if I modify the inputs to

+42,0; 42,0; 42,0X^1337

Can anyone say, why and how to fix it?

Upvotes: 0

Views: 187

Answers (1)

Gunther
Gunther

Reputation: 5256

The first lexer rule with the longest match wins, thus 42 is an INTEGER, and NUMBER in fact only matches when the comma part is present, i.e. when NUMBER has a longer match than INTEGER.

This can be fixed by adding a parser rule

number : NUMBER | INTEGER ;

and using that instead of NUMBER from other parser rules.

Upvotes: 2

Related Questions