Dims
Dims

Reputation: 50989

How to avoid creating of parasitic lexer rules in Antlr4?

The grammar below works incorrectly.

The grammar is following:

program:
    (keyword |
    string |
    WS)*;

keyword: 'print';

string: QUOTE (CH | WS)*? QUOTE;

QUOTE: '\'';

WS  : [ \t\r\n]+;

CH: .;

The goal is to have langauge with both string literals and keywords.

The parsed string is follows:

print 'printed'

It should be parsed as keyword, then whitespace, then string literal.

It is parsed this way instead:

enter image description here

Obviously, it sees keyword print inside string literal. This is because it has implicitly created parasitic rule for "print".

How to avoid/overcome this?

I don't wish to specify, that string literal can contain keywords, because it is logically incorrect.

Also I can't specify DOT lexer meta operator, because I don't wish to allow every token contained inside quotes (I don't want quote to occur there).

So, what to do?

Upvotes: 1

Views: 152

Answers (1)

Sam Harwell
Sam Harwell

Reputation: 99859

If you separate your combined grammar into a separate lexer grammar and parser grammar, ANTLR will not allow you to implicitly define lexer rules via literals placed in a parser rule. If you want print to be a keyword, you would need to include this lexer rule (otherwise 'print' would not be allowed in a parser rule):

PRINT : 'print';

The next step is to convert string from a parser rule to a lexer rule, such as this:

STRING : QUOTE ~'\''* QUOTE;

Upvotes: 2

Related Questions