Antlr4 - representing syntactic predicates that examine arbitrary number of tokens

Question

In Antlr3, I have the following grammar:

ruleA:
    (ruleBStart) => ruleB
    | ruleC
;

For the sake of simplicity let's assume ruleB is the grammar for SELECT statement in SQL, but could be nested in an arbitrary number of LPARENs. This is easy to represent in old grammar by simply saying:

ruleBStart:
    (LPAREN)* SELECT
;

In Antlr4, if I wanted to do the same thing, I would write a semantic predicate isRuleBStart() which may look like this (pseudocode):

@parser::members{
    public boolean isRuleBStart(int tokenNum)
    {
        int token = _input.LA(tokenNum);
        if (token == EOF) return false; // handling EOF probably needs more work
        if (token == SELECT) return true;
        if (token == LPAREN) return isRuleBStart(tokenNum++);
        return false;
    }
}

And then in my grammar, I would do:

ruleA:
    {isRuleBStart(1)}? ruleB
    | ruleC
;

This appears problematic to me since:

It involves recursion in a construct that is already rumored to be performance degrading
ruleBStart could get much more complicated if the ruleBStart rule had an arbitrary set of different tokens to check instead of just repeating LPAREN
It binds my code to target language. So if I wanted to publish a parser in Java and C++, I would have to re-implement this semantic predicate in both. (I know it is possible to carefully program the semantic predicate so the same code works in Java and C++, but that is not the point).

So I want to ask the community if there is a right Antlr4 way to achieve the same result.

Antlr4 - representing syntactic predicates that examine arbitrary number of tokens

Answers (1)

Related Questions