ANTLR : A lexer or a parser error?

Question

I wrote a simple lexer in ANTLR and the grammer for ID is something like this :

ID  :   (('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*|'_'('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*);

(No digits are allowed at the beginning)

when I generated the code (in java) and tested the input :

3a

I expected an error but the input was recognized as "INT ID" , how can i fix the grammer to make it report an error ?(with only lexer rules)

Thanks for your attention

Bart Kiers · Accepted Answer

Note that your rule could be rewritten into:

ID
 : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' |'_')*
 ;

or with fragments (rules that won't produce tokens, but are only used by other lexer rules):

ID
 : (Letter | '_') (Letter| Digit |'_')*
 ;

fragment Letter
 : 'a'..'z'
 | 'A'..'Z'
 ;

fragment Digit
 : '0'..'9'
 ;

But if input like "3a" is recognized by your lexer and produces the tokens INT and ID, then you shouldn't change anything. A problem with such input would probably come up in your parser rule(s) because it is semantically incorrect.

If you really want to let the lexer handle this kind of stuff, you could do something like this:

INT
 : Digit+ (Letter {/* throw an exception */})?
 ;

And if you want to allow INT literals to possibly end with a f or L, then you'd first have to inspect the contents of Letter and if it's not "f" or "L", the you throw an exception.

ANTLR : A lexer or a parser error?

Answers (1)

Related Questions