BlockyPenguin
BlockyPenguin

Reputation: 75

ANTLR4 not recognising a rule

in my g4 file, I have defined an integer like so:

INT: '0'
   | '-'? [1-9] [0-9_]*
   ;
   // no leading zeros are allowed!

A parser rule uses this like so:

versionDecl: PACK_VERSION_DECL INT;

However, when ANTLR comes across one, it doesn't recognise it, and throws a NullPointerException if I run ctx.INT().getText():

@Override
public void exitVersionDecl(VersionDeclContext ctx) {
    System.out.println(ctx.INT().getText());
}

Log:

line 1:13 mismatched input '6' expecting INT
[...]
java.lang.NullPointerException
    at com.blockypenguin.mcfs.MCFSCustomListener.exitVersionDecl(MCFSCustomListener.java:16)
    at main.antlr.MCFSParser$VersionDeclContext.exitRule(MCFSParser.java:604)
    at org.antlr.v4.runtime.tree.ParseTreeWalker.exitRule(ParseTreeWalker.java:47)
    at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:30)
    at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:28)
    at org.antlr.v4.runtime.tree.ParseTreeWalker.walk(ParseTreeWalker.java:28)
    at com.blockypenguin.mcfs.Main.main(Main.java:40)

(Unrelated output omitted for brevity)

And finally, the input I am parsing:

pack_version 6

Why does ANTLR not recognise the integer? Any help appreciated, thank you :)

Upvotes: 1

Views: 689

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

...

INT: '0'
   | '-'? [1-9] [0-9_]*
   ;
   // no leading zeros are allowed!

...

line 1:13 mismatched input '6' expecting INT

This error indicates that for the input 6, the lexer rule INT was not matched. This can happen if you have a lexer rules defined before the INT rule that also matches 6. Like this for example:

DIGIT
 : [0-9]
 ;

...

INT
 : '0'
 | '-'? [1-9] [0-9_]*
 ;

Now the input "6" (or any single digit) will be matched as a DIGIT token. Even if you have this in the parser part of your grammar:

parse
 : INT
 ;

the input "6" will still be tokenised as a DIGIT token: the lexer is not "driven" by the parser, it operates on it's own 2 rules:

  1. try to match as much characters as possible for a single lexer rule
  2. in case 2 or more lexer rules match the same amount of characters, let the rule defined first "win"

So, the input "12" will be tokenised as an INT token (rule 1 applies here), and input "0" is tokenised as a DIGIT token (rule 2).

Upvotes: 1

Related Questions