Reputation: 5668

antlr length of token and error handling

I'm using altlr version 3.4.

First question, please see grammar:

request: 'C' DELIM source DELIM target
  { System.out.println("Hi"); }
  ;
source: ID ;
target: ID ;

DELIM: '|' ;
fragment ALPHA: 'a'..'z' | 'A'..'Z' ;
fragment NUM: '0'..'9' ;
ID: ALPHA (ALPHA | NUM)* ;

"source" and "target" cannot be empty. But my test shows the following:

for input "C|n1|n2" : normal case, no problem.
for input "C||n2" : syntax error, and "Hi" not printed. Expected. Ok
for input "C|n1|" : syntax error, but "Hi" is printed. Not good.

I do need to set other things if "request" token is reached. But from above even for syntax error the code still reaches "request" token. Why?

Second question: how do I specify a rule for fixed length token, for example, a token of exact 10 digits?

Third question is about error handling. I override emitErrorMessage() in parser to set an error flag, but I found another emitErrorMessage() in lexer. I don't want to share the error flag between the parser and lexer objects. Can I override emitErrorMessage() in lexer to do nothing, and totally rely on the parser to report error? Or put another way, if there is an error, will the parser capture it for sure?

And if the error flag is set for one error, can the parser actually recovers and matches anther rule, so the previous error is false alarm?

Thanks for any help!

Upvotes: 0

Answers (2)

Michael Chen

Reputation: 5668

Bart, your help is great. I also thought it through and understood the behavior for Question#1 is legitimate. Like a compiler the parser will recover and continue to find as many errors as possible.

For question#2, I also figured out some way to do fixed length. Don't know if it's the popular way:

example : exact3 '|' exact4 ;

// method 1:
exact3 : (d+=DIGIT)+ {$d!=null && $d.size()==3}? ;

// method 2
exact4 : atmost4 {$atmost4.text.length()==4}? ;
atmost4:
@init {int n=1;}
: ({n<=4}?=>DIGIT {n++;})+
;

DIGIT:'0'..'9' ;

For question#3, I'll do fail on first error, i.e. override emitErrorMessage() in both lexer and parser to throw an exception. The choice of emitErrorMessage(msg) is because it has the error message properly prepared.

Thanks all who are sharing!

Upvotes: 0

Bart Kiers

Reputation: 170227

...

for input "C|n1|" : syntax error, but "Hi" is printed. Not good.

I do need to set other things if "request" token is reached. But from above even for syntax error the code still reaches "request" token. Why?

Because the parser tries to recover from this. If you don't want the parser to (try to) recover from mis-matched tokens, simply throw an exception like this:

grammar T;

// options...

@members {
  @Override
  public void emitErrorMessage(String message) {
    throw new RuntimeException(message);
  }
}

request
 : 'C' DELIM source DELIM target { System.out.println("Hi"); }
 ;

// more rules...

Note that @members is short for @parser::members, it will only cause the emitErrorMessage(...) to be overridden in the parser, not the lexer. For lexer-members, you need to do @lexer::members.

Second question: how do I specify a rule for fixed length token, for example, a token of exact 10 digits?

See: ANTR3 set the number of accepted characters for a token

Third question is about error handling. ...

See the first part of my answer: simply override emitErrorMessage() and do nothing in it (the default action is to print on the std.err).

Can I override emitErrorMessage() in lexer to do nothing, and totally rely on the parser to report error?

Well, the parser and lexer handle different type or errors, so ignoring certain problems in the lexer might not cause the parser to produce a warning/error.

Upvotes: 1

antlr length of token and error handling

Answers (2)

Related Questions