How to use the ANTLR 4 TestRig to show which lexer rule is used when tokenizing input?

Question

I have an ANTLR 4 lexer grammar with a BEGIN lexer rule and an ID lexer rule:

lexer grammar Begin;                  

BEGIN : 'begin' ;
ID  : [a-z]+ ;

WS  : [ 	
]+ -> skip ;

After generating the lexer and compiling, I ran the ANTLR TestRig tool with input 'begin':

grun Begin tokens -tokens
begin
^Z

I got this output:

[@0,0:4='begin',<1>,1:0]
[@1,7:6='',<-1>,2:0]

Notice the token type is 1 (as <1> indicates).

I ran it again, this time with input 'beginning':

grun Begin tokens -tokens
beginning
^Z

I got this output:

[@0,0:8='beginning',<1>,1:0]
[@1,11:10='',<-1>,2:0]

Why do I get the same token type? Does that mean the lexer is using the same lexer rule for both inputs?

How do I get TestRig to show me that the lexer uses this rule: BEGIN : 'begin' ;

for tokenizing this input: begin

and this rule: ID : [a-z]+ ;

for tokenizing this input: beginning

Andy · Accepted Answer

I used the following test setup:

grammar Begin;

test: (BEGIN | ID)+;

BEGIN : 'begin' ;
ID  : [a-z]+ ;

WS  : [ 	
]+ -> skip ;

with ANTLRWorks 2.1. It works as expected:

with 'begin':

Arguments: [Begin, test, -tokens, -tree, -gui, C:\ANTLR\Begin.txt]
[@0,0:4='begin',<1>,1:0]
[@1,5:4='',<-1>,1:5]
(test begin)

with 'beginning':

Arguments: [Begin, test, -tokens, -tree, -gui, C:\ANTLR\Begin.txt]
[@0,0:8='beginning',<2>,1:0]
[@1,9:8='',<-1>,1:9]
(test beginning)

Answers (1)