Reputation: 1188
I am trying to do basic ANTLR-based scanning. I have a problem with a lexer not matching wanted tokens.
lexer grammar DefaultLexer;
ALPHANUM : (LETTER | DIGIT)+;
ACRONYM : LETTER '.' (LETTER '.')+;
HOST : ALPHANUM (('.' | '-') ALPHANUM)+;
fragment
LETTER : UNICODE_CLASS_LL | UNICODE_CLASS_LM | UNICODE_CLASS_LO | UNICODE_CLASS_LT | UNICODE_CLASS_LU;
fragment
DIGIT : UNICODE_CLASS_ND | UNICODE_CLASS_NL;
For the grammar above, hello. world
string given as an input results in world
only. Whereas I would expect to get both hello
and world
. What am I missing? Thanks.
ADDED:
Ok, I learned that input hello. world
matches more characters using rule HOST than ALPHANUM, therefore lexer will choose to use it. Then, when it fails to match input to the HOST rule, it does not "look back" to , because that's how lexer works.
How I get around it?
Upvotes: 0
Views: 220
Reputation: 99999
As a foreword, ANTLR 4 would not behave in a strange manner here. Both ANTLR 3 and ANTLR 4 should be matching ALPHANUM
, then giving 2 syntax errors, then matching another ALPHANUM
, and I can state with confidence that ANTLR 4 will behave that way.
HOST
rule might be better suited to be host
, a parser rule..
(either together or as two separate tokens).Upvotes: 1