Heinrich
Heinrich

Reputation: 289

ANTLR basic behavior

I use the following Java-code to instantiate a parser generated with ANTLR.

package foo;    
public class Test1 {
public static void main(String[] args) throws RecognitionException {
       CharStream stream = new ANTLRStringStream("foo ");
       BugLexer lexer = new BugLexer(stream);
       CommonTokenStream tokenStream = new CommonTokenStream(lexer);
       BugParser parser = new BugParser(tokenStream);
       parser.specification();
    }
}

My grammar:

grammar Bug;
options {
  language = Java;
}
@header {
  package foo;
}
@lexer::header {
  package foo;
}
specification : 
   'foo' EOF 
;  
WS 
   : (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;} 
;
SCOLON
   : (~ ';')+
;

And the error I get:
line 1:0 mismatched input 'foo ' expecting 'foo'
I would expect the space in the input to be ignored, but its not.. The antlr interpreter in eclipse says its fine so I suppose my Java code is wrong somehow, but I just don't see it... Note: If I remove the rule for SCOLON then theres not bug for the input.

Upvotes: 1

Views: 301

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170138

ANTLR's lexer tries to match as much as possible for each token. Therefor "foo " is being tokenized as a single SCOLON token and not as a 'foo'- and WS token.

Note that your SCOLON rule:

SCOLON
 : (~ ';')+
 ;

suggests by its name to match just a single semi-colon, but in fact matches one ore more characters other than a semi-colon. Perhaps it should have been this instead:

SCOLON
 : ';'
 ;

?

EDIT

Heinrich Ody wrote:

I somehow thought there is a priority (given by order of declaration) on which token ANTLR attempts to match the input. Thanks for your response.

That is correct: whenever two (or more) rules match the same amount of characters, the rule defined first will "win". But if a rule defined last matches the most characters, it "wins".

Upvotes: 2

Related Questions