grackkle
grackkle

Reputation: 796

Antlr4 ignores tokens

In ANTLR 4 I try to parse a text file, but some of my defined tokens are constantly ignored in favor of others. I produced a small example to show what I mean:

File to parse:

hello world
hello world

Grammar:

grammar TestLexer;

file : line line;
line : 'hello' ' ' 'world' '\n';

LINE : ~[\n]+? '\n';

The ANTLR book explains that 'hello' would become an implicit token, which is placed before the LINE token, and that token order matters. So I'd expect that the parser would NOT match the LINE token, but it does, as the resulting tree shows:

Unexpected Result

How can I fix this, so that I get the actual implicit tokens?

Btw. I also tried to write explicit tokens before LINE, but that didn't change anything.

Upvotes: 0

Views: 421

Answers (1)

grackkle
grackkle

Reputation: 796

Found it myself:

It seems that ANTLR chooses longest tokens first. So since LINE would always match a whole line it is always preferred.

To still include some "joker" token into a grammar it should be a single symbol. In my case

grammar TestLexer;

file : line line;
line : 'hello' ' ' 'world' '\n';

LINE : ~[\n];

would work.

Upvotes: 2

Related Questions