Reputation: 536
I have a Lexer Rule as follows:
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
SUFFIX : [ab];
TCHAN : PREFIX EXTRA? DIGIT+ SUFFIX?;
and a parser rule:
tpin : TCHAN
;
In the exit_tpin() Listiner method, is there a syntax where I can extract the DIGIT component of the token? Right now I can get the ctx.TCHAN() element, but this is a string. I just want the digit portion of TCHAN.
Or should I remove TCHAN as a TOKEN and move that rule to be tpin (i.e)
tpin : PREFIX EXTRA? DIGIT+ SUFFIX?
Where I know how to extract DIGIT from the listener.
My guess is that by the time the TOKEN is presented to the parser it is too late to deconstruct it... but I was wondering if some ANTLR guru's out there knew of a technique.
If I re-write my TOKENIZER, there is a possiblity that TCHAN tokens will be missed for INT/ID tokens (I think thats why I ended up parsing as I do).
I can always do some regexp work in the listener method... but that seemed like bad form ... as I had the individual components earlier. I'm just lazy, and was wondering if a techniqe other than refactoring the parsing grammar was possible.
Upvotes: 1
Views: 1130
Reputation: 3734
In The Definitive ANTLR Reference you can find examples of complex lexers where much of the work is done. But when learning ANTLR, I would advise to consider the lexer mostly for its splitting function of the input stream into small tokens. Then do the big work in the parser. In the present case I would do :
grammar Question;
/* extract digit */
question
: tpin EOF
;
tpin
// : PREFIX EXTRA? DIGIT+ SUFFIX?
// {System.out.println("The only useful information is " + $DIGIT.text);}
: PREFIX EXTRA? number SUFFIX?
{System.out.println("The only useful information is " + $number.text);}
;
number
: DIGIT+
;
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
DIGIT : [0-9] ;
SUFFIX : [ab];
WS : [ \t\r\n]+ -> skip ;
Say the input is d_xyz123456b
. With the first version
: PREFIX EXTRA? DIGIT+ SUFFIX?
you get
$ grun Question question -tokens data.txt
[@0,0:1='d_',<PREFIX>,1:0]
[@1,2:4='xyz',<EXTRA>,1:2]
[@2,5:5='1',<DIGIT>,1:5]
[@3,6:6='2',<DIGIT>,1:6]
[@4,7:7='3',<DIGIT>,1:7]
[@5,8:8='4',<DIGIT>,1:8]
[@6,9:9='5',<DIGIT>,1:9]
[@7,10:10='6',<DIGIT>,1:10]
[@8,11:11='b',<SUFFIX>,1:11]
[@9,13:12='<EOF>',<EOF>,2:0]
The only useful information is 6
Because the parsing of DIGIT+
translates to a loop which reuses DIGIT
setState(12);
_errHandler.sync(this);
_la = _input.LA(1);
do {
{
{
setState(11);
((TpinContext)_localctx).DIGIT = match(DIGIT);
}
}
setState(14);
_errHandler.sync(this);
_la = _input.LA(1);
} while ( _la==DIGIT );
and $DIGIT.text
translates to ((TpinContext)_localctx).DIGIT.getText()
, only the last digit is retained. That's why I define a subrule number
: PREFIX EXTRA? number SUFFIX?
which makes it easy to capture the value :
[@0,0:1='d_',<PREFIX>,1:0]
[@1,2:4='xyz',<EXTRA>,1:2]
[@2,5:5='1',<DIGIT>,1:5]
[@3,6:6='2',<DIGIT>,1:6]
[@4,7:7='3',<DIGIT>,1:7]
[@5,8:8='4',<DIGIT>,1:8]
[@6,9:9='5',<DIGIT>,1:9]
[@7,10:10='6',<DIGIT>,1:10]
[@8,11:11='b',<SUFFIX>,1:11]
[@9,13:12='<EOF>',<EOF>,2:0]
The only useful information is 123456
You can even make it simpler :
tpin
: PREFIX EXTRA? INT SUFFIX?
{System.out.println("The only useful information is " + $INT.text);}
;
PREFIX : [abcd]'_';
EXTRA : ('xyz' | 'XYZ' );
INT : [0-9]+ ;
SUFFIX : [ab];
WS : [ \t\r\n]+ -> skip ;
$ grun Question question -tokens data.txt
[@0,0:1='d_',<PREFIX>,1:0]
[@1,2:4='xyz',<EXTRA>,1:2]
[@2,5:10='123456',<INT>,1:5]
[@3,11:11='b',<SUFFIX>,1:11]
[@4,13:12='<EOF>',<EOF>,2:0]
The only useful information is 123456
In the listener you have a direct access to these values through the rule context TpinContext
:
public static class TpinContext extends ParserRuleContext {
public Token INT;
public TerminalNode PREFIX() { return getToken(QuestionParser.PREFIX, 0); }
public TerminalNode INT() { return getToken(QuestionParser.INT, 0); }
public TerminalNode EXTRA() { return getToken(QuestionParser.EXTRA, 0); }
public TerminalNode SUFFIX() { return getToken(QuestionParser.SUFFIX, 0); }
Upvotes: 1