Reputation: 35
I have this simple grammar file:
expr : ID Divider ID;
divider_stat : 'Divider' Divider;
Divider : '#';
ID : ALPHA ('_' | ALPHA | DIGIT)*;
fragment ALPHA : [a-zA-Z];
fragment DIGIT : [0-9];
SkipTokens : [ \t\r\n]+ -> skip;
In this case, Divider
is fixed (#
). But in real scenario, Divider
will be defined as the char after 'Divider'
keyword.
Is there anyway to set Divider
according to the value in divider_stat
?
For input:
Divider -
id1 - id2
Tokens will be:
<ID>,'id1'
<Divider>,'-'
<ID>,'id2'
For input:
Divider $
id1$id2
Tokens will be:
<ID>,'id1'
<Divider>,'$'
<ID>,'id2'
The divider is always 1 char
Upvotes: 0
Views: 253
Reputation: 170158
You could use a lexical mode for this, a bit of target specific code and a predicate. Whenever the lexer "sees" the keyword "Divider"
, it moves in the DividerMode
where only a space can be matched (and is skipped) or a non-space is matched, which will become the new divider char. Inside the Divider
lexer rule, you first chck (with a predicate) if the next character in the stream is the current divider-char.
Here's a small Java demo:
DemoLexer.g4
lexer grammar DemoLexer;
@members {
private char divider = '#';
}
K_Divider : 'Divider' -> skip, pushMode(DividerMode);
Divider : {_input.LA(1) == divider}? . ;
ID : ALPHA ('_' | ALPHA | DIGIT)*;
fragment ALPHA : [a-zA-Z];
fragment DIGIT : [0-9];
SkipTokens : [ \t\r\n]+ -> skip;
mode DividerMode;
Spaces : [ \t\r\n]+ -> skip;
NewDivider : ~[ \t\r\n] {this.divider = getText().charAt(0);} -> skip, popMode;
DemoParser.g4
parser grammar DemoParser;
options {
tokenVocab=DemoLexer;
}
parse : expr+ EOF;
expr : ID Divider ID;
And a small Java class to test it all:
String source =
"id1 # id2\n" +
"Divider -\n" +
"id3 - id4";
DemoLexer lexer = new DemoLexer(CharStreams.fromString(source));
DemoParser parser = new DemoParser(new CommonTokenStream(lexer));
ParseTree root = parser.parse();
System.out.println(root.toStringTree(parser));
will print:
(parse (expr id1 # id2) (expr id3 - id4) <EOF>)
When using lexical modes, you need to separate the lexer- and parser grammar files. You could also use a combined grammar, but then you'd need to match Divider ?
in one go:
Demo.g4
grammar Demo;
@lexer::members {
private char divider = '#';
}
parse : expr+ EOF;
expr : ID Divider ID;
K_Divider : 'Divider' [ \t\r\n]+ ~[ \t\r\n] {this.divider = getText().charAt(getText().length() - 1);} -> skip;
Divider : {_input.LA(1) == divider}? . ;
ID : ALPHA ('_' | ALPHA | DIGIT)*;
fragment ALPHA : [a-zA-Z];
fragment DIGIT : [0-9];
SkipTokens : [ \t\r\n]+ -> skip;
Upvotes: 1