Reputation: 21
The Antlr somehow ignores the characters such as ~,@,#,$,%,*,(,),{,},[,] from the input string.
I tested the below grammar with input string's such as show~~~, show ~@#$% etc but the Antlr escapes the characters on the eclipse/antlr works interpreter. I want such scenarios to throw an exception and not to recover from them. Pls do let me know if you have faced this before and if so what did u do to get rid of it.
grammar Sample;
options {language = Java;} @header {package a.b.c;} @lexer::header {package a.b.c;}
prog: stat+ ; stat: expr ; expr: paramValueChildStructure ;
paramValueChildStructure: ALPHANUMERIC;
ALPHANUMERIC: ('a'..'z' |'A'..'Z' | '0'..'9')+ ;
I tried to below option to get rid of the above issue but this gives unreachable code compile time issue in my generated lexer.java
OTHER : . {throw new RuntimeException("unknown char: '" + $text + "'");};
Thanks, Ashish
Upvotes: 2
Views: 542
Reputation: 53552
Look here: http://www.antlr3.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery
The last paragraph before the conclusion is probably what you need:
Other Recovery Mechanisms Within ANTLR Runtimes
There is one other aspect of recovery which you may need to customize, and that is what happens when a mismatch() occurs. You will see in the generated code that there are lots of calls to the match() method. Inspecting the default implementation (in the Java runtime) we find that the match method will call the method recoverFromMismatchedToken() and this in turn will try to use the current Follow set stack to determine if the reason we mismatched is that there was a spurious token in the input: X Y Z when we wanted just X Z, or a missing token: X Z when we wanted X Y Z. If ANTLR can determine, using the Follow sets, that by skipping a token, it would see valid syntax, then it will consume the spurious token, report the extra token, but will not raise a RecognitionException. Similarly, if ANTLR can see that there is exactly one token missing from the input stream, which if present, would make the syntax valid, then it will manufacture this missing token, report the error, but again will not raise the RecognitionException.
If you want behavior that is different to this, then you can override the match() method, or more likely, the recoverFromMismatchedToken() method. Perhaps you do not want the spurious/missing error detection? Or, as you will see from the default implementation, ANTLR will first see if it can fix things by ignoring a token, then go on to see if it can fix things by adding a token. However, there are some syntax errors that can be recovered using either method - perhaps you want to reverse the order that these strategies are tried?
Upvotes: 1