Reputation:
The initial title question was: Why does my lexer rule not work, until I change it to a parser rule? The contents below are related to this question. Then I found new information and changed the title question. Please see my comment!
My Antlr Grammar (Only the "Spaces" rule and it's use is important).
Because my input comes from an ocr source there can be multiple whitespaces, but on the other hand i need to recognize the spaces, because they have meaning for the text structure. For this reason in my grammar I defined
Spaces: Space (Space Space?)?;
but this throws the error above - the whitespace is not recognzied. So when I replace it with a parser rule (lowercase!) in my grammar
spaces: Space (Space Space?)?;
the error seems to be solved (subsequent errors appear - not part of this question).
So why is the error solved then in this concrete case when using a parser rule instead of a lexer rule? And in general - when to use a lexer rule and when a parser rule?
Thank you, guys!
Upvotes: 0
Views: 234
Reputation: 241691
A single space is being recognized as a Space
and not as a Spaces
, since it matches both lexical rules and Space
comes first in the grammar file. (You can see that token type 1 is being recognized; Spaces
would be type 9 by my count.)
Antlr uses the common "maximum munch" lexical strategy in which the lexical token recognized corresponds to the longest possible match, ordering the possibilities by order in the file in case two patterns match the same longest match. When you put Spaces
first in the file, it wins the tie rule. If you make it a parser rule instead of a lexical rule, then it gets applied after the unambiguous lexical rule for Space
.
Do you really only want to allow up to 3 spaces? Otherwise, you could just ditch Space
and define Spaces
as " "*
.
Upvotes: 0