gel
gel

Reputation: 301

Why does whitespaces matter in this case for antlr4?

Lets say I have this grammar, written with Antlr4:

grammar Test;
start : expr* ;

expr : expr '-' expr
    | INT ;

MINUS : '-' ;
INT: MINUS? DIGIT+ ; // Disclaimer: this definition of an integer is just for illustration purposes

DIGIT : '0'..'9' ;

WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

My thought process is that 1-1 should be the same as 1 - 1; which should be expr '-' expr. In case of 1 - 1

      start
      expr(-)

expr(1)     expr(1)

Above tree seems correct, which again evaluated to expr '-' expr.

But when not using spaces, antlr think there are two INT expr. In case of 1-1

       start
expr(1)     expr(-1)

Should not all whitespaces (with the WS rule) be skipped, which means both of the expression should be parsed the same way?

Upvotes: 1

Views: 109

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170138

Lexer rules match as much characters as possible, so - 1 is tokenised as a MINUS and an INT and -1 (without the space) as s single INT.

You must realise that the lexer does not listen to the parser. If the parser tries to match the tokens INT MINUS INT for the input 1-1, the lexer does not produce these tokens. Because the lexer matches as much characters as possible, it will always create two INT tokens for that input (no MINUS!). Parsing and tokenisation are 2 separate steps.

Upvotes: 1

Related Questions