diwatu
diwatu

Reputation: 5699

How to use same word in different optional Lexer tokens

I simplified my Antlr4 grammar to this:

grammar test;

directives:
    ('[' directive ']')* EOF;


directive:      
      KEY1 OPERATOR OPTIONS1
    | KEY2 OPERATOR OPTIONS2;
OPERATOR: '=';

KEY1: 'Key1';
KEY2: 'Key2';

OPTIONS1: 'a'|'b'|'c';
OPTIONS2: 'c'|'d'|'e';

When I try to use this grammar to parse:

[Key1=a][Key2=c]

The Parser give an error:

line 1:14 mismatched input 'c' expecting OPTIONS2

In my real work, OPTIONS1 and OPTIONS2 are different enum data types, 'c' is the one in both.

Upvotes: 0

Views: 41

Answers (1)

eocron
eocron

Reputation: 7526

You should split intersections:

OPTIONS1: 'a'|'b'|'c';
OPTIONS2: 'c'|'d'|'e';

So, your rules will be:

OPTIONS1: 'a'|'b';
OPTIONS2: 'd'|'e';
OPTIONS3: 'c';

and:

directive:      
      KEY1 OPERATOR (OPTIONS1 | OPTIONS3)
    | KEY2 OPERATOR (OPTIONS2 | OPTIONS3)

This happens, because Lexer performs token identification from tree leafs, so, your 'c' interpreted by lexer as OPTIONS1 instead of OPTIONS2 by their order in grammar.

I forgot how your tokens can be inlined (interpreted as macros), so it will look like this in preprocessor (it will work too):

directive:      
      KEY1 OPERATOR ('a'|'b'|'c')
    | KEY2 OPERATOR ('c'|'d'|'e');

You better to read their current syntax, it can be inlined. Drawback is that you will not see OPERATOR1 and OPERATOR2 in AST view.

Upvotes: 1

Related Questions