ANTLR LEXER RULE to have two rules, one will accept every characters including symbols and another will accept only characters

Is it possible in ANTLR LEXER RULE to have two rules, one will accept every characters including all symbols(like (,),_ etc) and another will accept only characters a to z?

Something like below:

String: ('a'..'z'|'A'..'Z')*;
EVERYTHING:(.)*;   

Upvotes: 0

Views: 610

Answers (1)

Jiri Tousek
Jiri Tousek

Reputation: 12440

Yes, it is possible.

This is how ANTLR lexer decides which rule to use:

  • whichever rule can match the longest sub-sequence of the input (starting from the current position in the input)
  • in case more rules can match this sub-sequence (i.e. it's a tie), the first rule (as defined in the grammar file) wins

So in your case, for alpha-only input, both rules will match it, but since String is further up in the grammar, it will be used. In case of non-alpha input, the EVERYTHING rule will be able to match a longer sub-sequence and therefore will be used.

Note however that as it's written, your EVERYTHING rule matches even spaces and newlines, so in this specific case String rule will be used only if the whole input is just alpha characters and nothing else; whole input will be matched as a single token in either case. So in real grammar, the EVERTYHING rule will be probably slightly different.

Upvotes: 1

Related Questions