Andrew B
Andrew B

Reputation: 23

Java expression parsing with ANTLR

I'm writing a toolkit in Java that uses Java expression parsing. I thought I'd try using ANTLR since

  1. It seems to be used ubiquitously for this sort of thing
  2. There don't seem to be a lot of open source alternatives
  3. I actually tried to write my own generalized parser a while back and gave up. That stuff's hard.

I have to say, after what I feel is a lot of reading and trying different things (more than I had expected to spend, anyway), ANTLR seems incredibly difficult to use. The API is very unintuitive--I'm never quite sure whether I'm calling it right.

Although ANTLR tutorials and examples abound, I haven't had luck finding any examples that involve parsing Java "expressions" -- everyone else seems to want to parse whole java files.

I started off calling it like this:

        Java8Lexer lexer = new Java8Lexer(CharStreams.fromString(text));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParseTree result = parser.expression();

but that wouldn't parse the whole expression. E.g. with text "a.b" it would return a result that only consisted of the "a" part, just quitting after the first thing it could parse.

Fine. So I changed to:

        String input = "return " + text + ";";
        Java8Lexer lexer = new Java8Lexer(CharStreams.fromString(input));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParseTree result = parser.returnStatement();
        result = result.getChild(1);

thinking this would force it to parse the entire expression, then I could just extract the part I cared about. That worked for name expressions like "a.b", but if I try to parse a method expression like "a.b.c(d)" it gives an error:

line 1:12 mismatched input '(' expecting '.'

Interestingly, a(), a.b(), and a.b.c parse fine, but a.b.c() also dies with the same error.

Is there an ANTLR expert here who might have an idea what I'm doing wrong?

Separately, it bothers me quite a bit that the error above is printed to stderr, but I can't find it in the result object anywhere. I'd like to be able to present that error message (vague as it is) to the user that entered the expression--they may not be looking at a console, and even if they are, there's no context there. Is there a way to find that information in the result I get back?

Any help is greatly appreciated.

Upvotes: 1

Views: 957

Answers (1)

Mike Cargal
Mike Cargal

Reputation: 6785

For a rule like expression, ANTLR will stop parsing once it recognizes an expression.

You can force it to continue by adding an `EOF to you start rule.

(You don’t want to modify the actual `expressions rule, but you can add a rule like this:

expressionStart: expressions EOF;

Then you can use:

ParseTree result = parser.expressionStart();

This will force ANTLR to continue parsing you’re input until it reaches the end of you input.


re: returnStatement

When i run return a.b.c(); through the ANTLR Preview in IntelliJ, I get this parse tree:

enter image description here

A little bit of following the grammar rules, and I stumble across these rules:

typeName: Identifier | packageOrTypeName '.' Identifier;

packageOrTypeName
    : Identifier
    | packageOrTypeName '.' Identifier
    ;

That both rules include an alternative for packageOrTypeName '.' Identifier looks problematic to me.

In the tree, we see primaryNoNewArray_lfno_primary:2 which indicates a match of the second alternative in this rule:

primaryNoNewArray_lfno_primary
    : literal
    | typeName ('[' ']')* '.' 'class' // <-- trying to match this rule
    | unannPrimitiveType ('[' ']')* '.' 'class'
    | 'void' '.' 'class'
    | 'this'
    | typeName '.' 'this'
    | '(' expression ')'
    | classInstanceCreationExpression_lfno_primary
    | fieldAccess_lfno_primary
    | arrayAccess_lfno_primary
    | methodInvocation_lfno_primary
    | methodReference_lfno_primary
    ;

I'm out of time at the moment, but will keep looking at it. It seems pretty unlikely there's this obvious a bug in the Java8Parser.g4, but it certainly seems like a bug at the moment. I'm not sure what about the context would change how this is parsed (by context, meaning where returnStatement is natively called in the grammar.)

I tried this input (starting with the compilationUnit rule:

class Test {
    class A {
       public B  b;
    }
    class B {
        String c() {
            return "";
        }
    }
    String test() {
        A a = new A();
        return a.b.c();
    }
}

And it parses correctly (so, we've not found a major bug in the Java8Parser grammar 😔):

enter image description here

Still, this doesn't seem right.

Getting closer:

If I start with the block rule, and wrap in curly braces ({return a.b.c();}), it parses fine.

I'm going to go with the theory that ANTLR needs a bit more lookahead to resolve an "ambiguity".

Upvotes: 2

Related Questions