Reputation: 7387
I have a short question:
// Lexer
LOOP_NAME : (LETTER|DIGIT)+;
OTHERCHARS : ~('>' | '}')+;
LETTER : ('A'..'Z')|('a'..'z');
DIGIT : ('0'..'9');
A_ELEMENT
: (LETTER|'_')*(LETTER|DIGIT|'_'|'.');
// Parser-Konfiguration
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
My problem is that this is impossible due to:
As a result, alternative(s) 2 were disabled for that input [14:55:32] error(208): ltxt2.g:61:1: The following token definitions can never be matched because prior tokens match the same input: LETTER,DIGIT,A_ELEMENT,WS
My issue is that I also need to catch UTF8 with OTHERCHARS... and I cannot put all special UTF8 chars into a Lexer rule since I cannot range like ("!".."?").
So I need the NOT (~). The OTHERCHARS here can be everything but ">" or "}". These two close a literal context and are forbidden within.
It doesn't seem such cases are referenced very well, so I'd be happy if someone knew a workaround. The NOT operator here creates the ambivalence I need to solve.
Thanks in advance.
Best, wishi
Upvotes: 0
Views: 312
Reputation: 100029
Move OTHERCHARS
to the very end of the lexer and define it like this:
OTHERCHARS : . ;
In the Java target, this will match a single UTF-16 code point which is not matched by a previous rule. I typically name the rule ANY_CHAR
and treat it as a fall-back. By using .
instead of .+
, the lexer will only use this rule if no other rule matches.
ANY_CHAR
due to matching a larger number of characters from the input.ANY_CHAR
due to appearing earlier in the grammar.Edit: To exclude }
and >
from the ANY_CHAR
rule, you'll want to create rules for them so they are covered under point 2.
RBRACE : '}' ;
GT : '>' ;
ANY_CHAR : . ;
Upvotes: 1