How to use same word in different optional Lexer tokens

Question

I simplified my Antlr4 grammar to this:

grammar test;

directives:
    ('[' directive ']')* EOF;


directive:      
      KEY1 OPERATOR OPTIONS1
    | KEY2 OPERATOR OPTIONS2;
OPERATOR: '=';

KEY1: 'Key1';
KEY2: 'Key2';

OPTIONS1: 'a'|'b'|'c';
OPTIONS2: 'c'|'d'|'e';

When I try to use this grammar to parse:

[Key1=a][Key2=c]

The Parser give an error:

line 1:14 mismatched input 'c' expecting OPTIONS2

In my real work, OPTIONS1 and OPTIONS2 are different enum data types, 'c' is the one in both.

eocron · Accepted Answer

You should split intersections:

OPTIONS1: 'a'|'b'|'c';
OPTIONS2: 'c'|'d'|'e';

So, your rules will be:

OPTIONS1: 'a'|'b';
OPTIONS2: 'd'|'e';
OPTIONS3: 'c';

and:

directive:      
      KEY1 OPERATOR (OPTIONS1 | OPTIONS3)
    | KEY2 OPERATOR (OPTIONS2 | OPTIONS3)

This happens, because Lexer performs token identification from tree leafs, so, your 'c' interpreted by lexer as OPTIONS1 instead of OPTIONS2 by their order in grammar.

I forgot how your tokens can be inlined (interpreted as macros), so it will look like this in preprocessor (it will work too):

directive:      
      KEY1 OPERATOR ('a'|'b'|'c')
    | KEY2 OPERATOR ('c'|'d'|'e');

You better to read their current syntax, it can be inlined. Drawback is that you will not see OPERATOR1 and OPERATOR2 in AST view.

How to use same word in different optional Lexer tokens

Answers (1)

Related Questions