user14781215
user14781215

Reputation: 35

Set Lexer rule from value in input stream

I have this simple grammar file:

expr         : ID Divider ID;
divider_stat : 'Divider' Divider;

Divider : '#';

ID      : ALPHA ('_' | ALPHA | DIGIT)*;

fragment ALPHA  : [a-zA-Z];
fragment DIGIT  : [0-9];

SkipTokens  : [ \t\r\n]+ -> skip;

In this case, Divider is fixed (#). But in real scenario, Divider will be defined as the char after 'Divider' keyword.

Is there anyway to set Divider according to the value in divider_stat?

For input:

Divider -
id1 - id2

Tokens will be:

<ID>,'id1'
<Divider>,'-'
<ID>,'id2'

For input:

Divider $
id1$id2

Tokens will be:

<ID>,'id1'
<Divider>,'$'
<ID>,'id2'

The divider is always 1 char

Upvotes: 0

Views: 253

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

You could use a lexical mode for this, a bit of target specific code and a predicate. Whenever the lexer "sees" the keyword "Divider", it moves in the DividerMode where only a space can be matched (and is skipped) or a non-space is matched, which will become the new divider char. Inside the Divider lexer rule, you first chck (with a predicate) if the next character in the stream is the current divider-char.

Here's a small Java demo:

DemoLexer.g4

lexer grammar DemoLexer;

@members {
  private char divider = '#';
}

K_Divider : 'Divider' -> skip, pushMode(DividerMode);
Divider   : {_input.LA(1) == divider}? . ;
ID        : ALPHA ('_' | ALPHA | DIGIT)*;

fragment ALPHA  : [a-zA-Z];
fragment DIGIT  : [0-9];

SkipTokens  : [ \t\r\n]+ -> skip;

mode DividerMode;
 Spaces     : [ \t\r\n]+ -> skip;
 NewDivider : ~[ \t\r\n] {this.divider = getText().charAt(0);} -> skip, popMode;

DemoParser.g4

parser grammar DemoParser;

options {
  tokenVocab=DemoLexer;
}

parse : expr+ EOF;
expr  : ID Divider ID;

And a small Java class to test it all:

String source =
    "id1 # id2\n" +
    "Divider -\n" +
    "id3 - id4";

DemoLexer lexer = new DemoLexer(CharStreams.fromString(source));
DemoParser parser = new DemoParser(new CommonTokenStream(lexer));
ParseTree root = parser.parse();

System.out.println(root.toStringTree(parser));

will print:

(parse (expr id1 # id2) (expr id3 - id4) <EOF>)

When using lexical modes, you need to separate the lexer- and parser grammar files. You could also use a combined grammar, but then you'd need to match Divider ? in one go:

Demo.g4

grammar Demo;

@lexer::members {
  private char divider = '#';
}

parse : expr+ EOF;
expr  : ID Divider ID;

K_Divider : 'Divider' [ \t\r\n]+ ~[ \t\r\n] {this.divider = getText().charAt(getText().length() - 1);} -> skip;
Divider   : {_input.LA(1) == divider}? . ;
ID        : ALPHA ('_' | ALPHA | DIGIT)*;

fragment ALPHA  : [a-zA-Z];
fragment DIGIT  : [0-9];

SkipTokens  : [ \t\r\n]+ -> skip;

Upvotes: 1

Related Questions