Reputation: 29
I have a very simple grammar file, which looks like this:
grammar Wort;
// Parser Rules:
word
: ANY_WORD EOF
;
// Lexer Rules:
ANY_WORD
: SMALL_WORD | CAPITAL_WORD
;
SMALL_WORD
: SMALL_LETTER (SMALL_LETTER)+
;
CAPITAL_WORD
: CAPITAL_LETTER (SMALL_LETTER)+
;
fragment SMALL_LETTER
: ('a'..'z')
;
fragment CAPITAL_LETTER
: ('A'..'Z')
;
If i try to parse the input "Hello", everything is OK, BUT if if modify my grammar file like this:
...
// Parser Rules:
word
: CAPITAL_WORD EOF
;
...
the input "Hello" is no longer recognized as a valid input. Can anybody explain, what is going wrong?
Thanx, Lars
Upvotes: 2
Views: 107
Reputation: 11
The issue here has to do with precedence in the lexer grammar. Because ANY_WORD
is listed before CAPITAL_WORD
, it is given higher precedence. The lexer will identify Hello
as a CAPITAL_WORD
, but since an ANY_WORD
can be just a CAPITAL_WORD
, and the lexer is set up to prefer ANY_WORD
, it will output the token ANY_WORD
. The parser acts on the output of the lexer, and since ANY_WORD
EOF
doesn't match any of its rules, the parse fails.
You can make the lexer behave differently by moving CAPITAL_WORD
above ANY_WORD
in the grammar, but that will create the opposite problem -- capitalized words will never lex as ANY_WORD
s. The best thing to do is probably what Mephy suggested -- make ANY_WORD
a parser rule.
Upvotes: 1