Amr Ellafy
Amr Ellafy

Reputation: 740

ANTLR4: matching token with same rule but with different position in the grammar

I have the following statement I wish to parse:

in(name,(Silver,Gold))

The parser is always confused as ID and string array elements have the same rule. Using quotes or double quotes for string will help, but this is not the case here.

Also, predicates didn't help much.

The grammar:

grammar Rql;

statement
 : EOF
 | query EOF
 ;

query
 : function
 ;

function
 : FUNCTION_IN OPAR id COMMA OPAR array CPAR CPAR
 ;

array
 : VALUE (COMMA VALUE)*
 ;

FUNCTION_IN: 'in';

id
 : {in(}? ID
 ;

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

VALUE
 : STRING
 | INT
 | FLOAT
 ;

OPAR : '(';
CPAR : ')';
COMMA : ',';

INT
 : [0-9]+
 ;

FLOAT
 : [0-9]+ '.' [0-9]*
 | '.' [0-9]+
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

STRING
 :  [a-zA-Z_] [a-zA-Z_0-9]*
 ;

OTHER
 : .
 ;

Upvotes: 0

Views: 1177

Answers (1)

BernardK
BernardK

Reputation: 3744

The idea is to change the type of the token under some condition. Here seeing an ID for the first time in a line sets a switch to true. The next time an ID is matched, the lexer will execute the if and set the type to ID_VALUE. I wanted to reset the switch while entering the rule function, but it doesn't work :

function
@init {QuestionLexer.id_seen = false; System.out.println("id_seen has been reset" + QuestionLexer.id_seen);}
 : FUNCTION_IN OPAR ID COMMA OPAR array CPAR CPAR

ID=name1  seen ? false
ID=Silver  seen ? true
...
ID=Platinum  seen ? true
[@0,0:1='in',<'in'>,1:0]
[@1,2:2='(',<'('>,1:2]
[@2,3:7='name1',<ID>,1:3]
[@3,8:8=',',<','>,1:8]
[@4,9:9='(',<'('>,1:9]
[@5,10:15='Silver',<10>,1:10]
...
[@12,27:31='name2',<10>,2:3]
...
[@20,52:51='<EOF>',<EOF>,3:0]
Question last update 1336
id_seen has been reset false
id_seen has been reset false
line 2:3 mismatched input 'name2' expecting ID

.

That's why I reset it in the FUNCTION_IN rule.

Grammar Question.g4 :

grammar Question;

@lexer::members {
    static boolean id_seen = false;
}

tokens { ID_VALUE }

question
@init {System.out.println("Question last update 1352");}
 : function+ EOF
 ;

function
 : FUNCTION_IN OPAR ID COMMA OPAR array CPAR CPAR
 ;

array
 : value (COMMA value)*
 ;

value
 : ID_VALUE
 | INT
 | FLOAT
 ;

FUNCTION_IN: 'in' {id_seen = false;} ;

ID : [a-zA-Z_] [a-zA-Z_0-9]*
     {System.out.println("ID=" + getText() + "  seen ? " + id_seen);
      if (id_seen) setType(QuestionParser.ID_VALUE); id_seen = true; } ;

OPAR : '(';
CPAR : ')';
COMMA : ',';

INT
 : [0-9]+
 ;

FLOAT
 : [0-9]+ '.' [0-9]*
 | '.' [0-9]+
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

OTHER
 : .
 ;

File t.text :

in(name1,(Silver,Gold))
in(name2,(Copper,Platinum))

Execution with ANTLR 4.6 :

$ grun Question question -tokens -diagnostics t.text
ID=name1  seen ? false
ID=Silver  seen ? true
ID=Gold  seen ? true
ID=name2  seen ? false
ID=Copper  seen ? true
ID=Platinum  seen ? true
[@0,0:1='in',<'in'>,1:0]
[@1,2:2='(',<'('>,1:2]
[@2,3:7='name1',<ID>,1:3]
[@3,8:8=',',<','>,1:8]
[@4,9:9='(',<'('>,1:9]
[@5,10:15='Silver',<10>,1:10]
[@6,16:16=',',<','>,1:16]
[@7,17:20='Gold',<10>,1:17]
[@8,21:21=')',<')'>,1:21]
[@9,22:22=')',<')'>,1:22]
[@10,24:25='in',<'in'>,2:0]
[@11,26:26='(',<'('>,2:2]
[@12,27:31='name2',<ID>,2:3]
[@13,32:32=',',<','>,2:8]
[@14,33:33='(',<'('>,2:9]
[@15,34:39='Copper',<10>,2:10]
[@16,40:40=',',<','>,2:16]
[@17,41:48='Platinum',<10>,2:17]
[@18,49:49=')',<')'>,2:25]
[@19,50:50=')',<')'>,2:26]
[@20,52:51='<EOF>',<EOF>,3:0]
Question last update 1352

Type <10> is ID_VALUE as can be seen in the .tokens file

$ cat Question.tokens 
FUNCTION_IN=1
...
OTHER=9
ID_VALUE=10
'in'=1

Upvotes: 1

Related Questions