Lars Steinmetz
Lars Steinmetz

Reputation: 29

Problems with ANTLR4 grammar

I have a very simple grammar file, which looks like this:

grammar Wort;

// Parser Rules:
word
    :   ANY_WORD EOF
    ;

// Lexer Rules:
ANY_WORD
    : SMALL_WORD | CAPITAL_WORD
    ;
SMALL_WORD 
    : SMALL_LETTER (SMALL_LETTER)+
    ;
CAPITAL_WORD 
    : CAPITAL_LETTER (SMALL_LETTER)+
    ;
fragment SMALL_LETTER
    : ('a'..'z')
    ;
fragment CAPITAL_LETTER
    : ('A'..'Z')
    ;

If i try to parse the input "Hello", everything is OK, BUT if if modify my grammar file like this:

...

// Parser Rules:
word
    :   CAPITAL_WORD EOF
    ;

...

the input "Hello" is no longer recognized as a valid input. Can anybody explain, what is going wrong?

Thanx, Lars

Upvotes: 2

Views: 107

Answers (1)

shaboptimal
shaboptimal

Reputation: 11

The issue here has to do with precedence in the lexer grammar. Because ANY_WORD is listed before CAPITAL_WORD, it is given higher precedence. The lexer will identify Hello as a CAPITAL_WORD, but since an ANY_WORD can be just a CAPITAL_WORD, and the lexer is set up to prefer ANY_WORD, it will output the token ANY_WORD. The parser acts on the output of the lexer, and since ANY_WORD EOF doesn't match any of its rules, the parse fails.

You can make the lexer behave differently by moving CAPITAL_WORD above ANY_WORD in the grammar, but that will create the opposite problem -- capitalized words will never lex as ANY_WORDs. The best thing to do is probably what Mephy suggested -- make ANY_WORD a parser rule.

Upvotes: 1

Related Questions