Artem Bachynskyi
Artem Bachynskyi

Reputation: 35

ANTLR4: no viable alternative at input 'stringname'

I am doing my research by making a programming language using antlr4 and I am struggling for whole day to fix the problem with two words being one token after whitespace removal.

This is my grammar for antlr:

grammar Grammar;

start: (statement ';')*;

//needs expressions extension
statement
    : variable
    | //class
    | if
    | function
    | loop
    | functionCall
    | show
    ;

variable
    : TYPE ID ('=' VAR_TYPE)?
    | ...
    ;

array 
    : TYPE ID '[]' ('=' '[' VAR_TYPE (',' VAR_TYPE)* ']')?
    ;

//needs expressions extension
function
    : (ACCESS TYPE ID '(' ID* ')' '{' 
        (
            variable
            | if
            | loop
            | functionCall
        ) 'return' VAR_TYPE
      '}')
    | (ACCESS 'void' ID '(' ID* ')' '{' 
        (
            variable
            | if
            | loop
            | functionCall
        )
      '}')
    ;

//needs expressions extension
if: 'if' (ID | VAR_TYPE) COMPARISON (ID | VAR_TYPE) ':'
        (
            '\t' variable
            | '\t' if
            | '\t' loop
            | '\t' functionCall
            | '\t' show
        )*
    ('else if' (ID | VAR_TYPE) COMPARISON (ID | VAR_TYPE) ':'
        (
            '\t' variable
            | '\t' if
            | '\t' loop
            | '\t' functionCall
            | '\t' show
        )*
    )*
    ('else' ':'
        (
            '\t' variable
            | '\t' if
            | '\t' loop
            | '\t' functionCall
            | '\t' show
        )*
    )?
    ;

loop: 'foreach' ID 'in' ID ':'
    (
        '\t' variable
        | '\t' if
        | '\t' loop
        | '\t' functionCall
        | '\t' show
    )*
    ;

functionCall: (ID '.')? ID '()';

//needs expressions extension
show: 'show' '(' (ID | VAR_TYPE)? ('+' (ID | VAR_TYPE))* ')';

ACCESS: 'private' | 'public';
COMPARISON: '>' | '<' | '>=' | '<=' | '==';
TYPE: 'int' | 'float' | 'string';
VAR_TYPE: STRING | INT | BOOL | FLOAT | ID;
ID: [a-zA-Z_][a-zA-Z0-9_]* ;
STRING : '"' .*? '"' ;
INT : [0-9]+ ;
BOOL : 'true' | 'false' ;
FLOAT : [0-9]+ '.' [0-9]+ ;
WS : [ \t\r\n]+ -> skip;

This is what console gives after making a tree:

line 1:7 no viable alternative at input 'stringname'
line 2:4 no viable alternative at input 'intage'

And here is also input.txt file for grammar:

string name;
int age;
bool sex;
string children[];

public string returnPerson() {
    return "Name " + name + "\nAge " + age + "\nSex " + sex + "\n";
}

public bool isMinor() {
    if age > 17:
        return false;
    else:
        return true;
}

public void showChildren() {
    int i = 0;
    foreach child in children:
        show("Children №" + (i + 1) + ": " + child + "\n");
}

I basically just don't know what to do with this, I have witespaces sorted out, but it still thinks it is one token. Also, by the output tree I see that it doesnt go further than two first lines of input.txt.

Help me to fix this problem please.

Upvotes: 2

Views: 60

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170278

Your lexer will never produce an ID token because of this:

VAR_TYPE: STRING | INT | BOOL | FLOAT | ID;
ID: [a-zA-Z_][a-zA-Z0-9_]* ;

Because VAR_TYPE also matches an ID. ANTLR's lexer works like this:

  1. try to match a rule with as many characters as possible
  2. if 2 (or more) rules match the same amount of characters, let tthe one defined first "win"

Because of rule 2, it is clear that ID will never get matched.

VAR_TYPE seems a better candidate for a parser rule:

var_type : STRING | INT | BOOL | FLOAT | ID;

But there are quite a few other things incorrect with the grammar you posted. If you define '()' in your grammar, then a single '(' token will not be matched. When creating literal tokens inside parser rules, ANTLR creates tokens for them like this:

functionCall: (ID '.')? ID '()';
show: 'show' '(' (ID | VAR_TYPE)? ('+' (ID | VAR_TYPE))* ')';

T__0 : '.';
T__1 : '()';
T__2 : 'show';
T__3 : '(';
T__4 : ')';
...

If you now try to parse the input:

public string returnPerson() {
    return "Name " + name + "\nAge " + age + "\nSex " + sex + "\n";
}

using the parser rule:

function
 : ACCESS TYPE ID '(' ...
 ;

it will fail, because () is tokenized as a T__1 token, not as T__3 and T__4 tokens.

EDIT

Also, BOOL : 'true' | 'false' ; will never get matched because of the 2 match-rules I mentioned earlier (true and false will also be matched as VAR_TYPE tokens).

Here's a quick edit of your grammar so that it will correctly parse your input:

grammar Grammar;

start : statement* EOF;

statement
 : variable ';'
 | array ';'
 | if
 | function
 | loop
 | functionCall ';'
 | show ';'
 | 'return' expression ';'
 ;

function
 : ACCESS TYPE ID '(' ID* ')' '{' statement* '}'
 | ACCESS 'void' ID '(' ID* ')' '{' statement* '}'
 ;

variable     : TYPE ID ('=' expression)?;
array        : TYPE ID '[' ']' ('=' '[' expression (',' expression)* ']')?;
if           : 'if' expression ':' statement* ('else if' expression ':' statement*)* ('else' ':' statement*)?;
loop         : 'foreach' ID 'in' expression ':' statement*;
functionCall : (ID '.')? ID '(' ')';
show         : 'show' '(' expression ')';

expression
 : '(' expression ')'
 | expression '+' expression
 | expression COMPARISON expression
 | STRING
 | ID
 | INT
 | BOOL
 | FLOAT
 | ID
 ;

ACCESS     : 'private' | 'public';
COMPARISON : '>' | '<' | '>=' | '<=' | '==';
TYPE       : 'int' | 'float' | 'string' | 'bool';
BOOL       : 'true' | 'false' ;
ID         : [a-zA-Z_][a-zA-Z0-9_]* ;
STRING     : '"' (~[\\"] | '\\' .)* '"';
INT        : [0-9]+;
FLOAT      : [0-9]+ '.' [0-9]+;
WS         : [ \t\r\n]+ -> channel(HIDDEN);

enter image description here

Upvotes: 3

Related Questions