Federico Bellucci
Federico Bellucci

Reputation: 675

How to parse JavaScript function expression calls with ANTLR?

I am building a JavaScript instrumentor with ANTLR, using the Patrick Hulsmeijer EcmaScript 3 grammar.

I'm having a problem parsing this line of code:

function(){}();

that is a direct call of a function expression. The parser recognizes the statement as a function declaration and then fails when it finds the parentheses after the function body. The reason is that function declarations are recognized with most precedence to avoid the ambiguity with function expressions.

This is how the grammar recognizes function declarations:

sourceElement
options
{
    k = 1 ;
}
    : { input.LA(1) == FUNCTION }? functionDeclaration
    | statement
    ;

I am not even sure that it is a valid EcmaScript statement. Is it?
I think it should be more correct to write:

(function(){})();

which is actually well handled by the parser.
By the way this is not the core of the question, because I have no control over the code to instrument.

I tried to eliminate functionDeclaration from the sourceElement production and to put it in the statementstatementTail production:

statementTail
    : variableStatement
    | emptyStatement
    | expressionStatement
    | functionDeclaration
    | ifStatement
    | ...
    ;

But a build error arises:

[fatal] rule statementTail has non-LL(*) decision due to recursive rule invocations reachable from alts 3,4. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
|---> : variableStatement

because the variableStatement production contains functionExpression as a descendant, which leads to an ambiguity. The parser cannot choose among functionDeclaration and functionExpression because they are almost equal:

functionDeclaration
    : FUNCTION name=Identifier formalParameterList functionBody
    -> ^( FUNCTIONDECL $name formalParameterList functionBody )
    ;

functionExpression
    : FUNCTION name=Identifier? formalParameterList functionBody
    -> ^( FUNCTIONEXPR $name? formalParameterList functionBody )
    ;

Note: I modified the original rewrite rules using different tree nodes (FUNCTIONDECL and FUNCTIONEXPR) because I need it while walking the AST.

How can I solve this ambiguity?

Upvotes: 1

Views: 2839

Answers (1)

Gunther
Gunther

Reputation: 5256

The parser is right to expect a functionDeclaration, when a sourceElement begins with the 'function' keyword. This in fact implements the following restriction from the ECMAScript Language Specification:

an ExpressionStatement cannot start with the function keyword because that might make it ambiguous with a FunctionDeclaration.

The statement in question thus is invalid per the above restriction, though in fact it is not ambiguous by productions of the grammar: as it omits the function identifier, it cannot be a functionDeclaration. A statement exposing the syntactic ambiguity would be

function f(){}(42)

which according to the ECMAScript spec is a functionDeclaration, followed by an expressionStatement.

So the best thing to do is ask the provider of this code for correct syntax. You were saying that you need to parse it anyway, and that could possibly be done using ANTLR's backtracking. Make sure the function identifier is mandatory in the functionDeclaration, and have it try a functionDeclaration before a statement. But be aware that, even if this helps for the original statement, it will fail for

function f(){}()

because here the functionDeclaration can be completed successfully, but there is no valid statement following it.

Upvotes: 2

Related Questions