Cod.ie
Cod.ie

Reputation: 410

Antlr4 - representing syntactic predicates that examine arbitrary number of tokens

In Antlr3, I have the following grammar:

ruleA:
    (ruleBStart) => ruleB
    | ruleC
;

For the sake of simplicity let's assume ruleB is the grammar for SELECT statement in SQL, but could be nested in an arbitrary number of LPARENs. This is easy to represent in old grammar by simply saying:

ruleBStart:
    (LPAREN)* SELECT
;

In Antlr4, if I wanted to do the same thing, I would write a semantic predicate isRuleBStart() which may look like this (pseudocode):

@parser::members{
    public boolean isRuleBStart(int tokenNum)
    {
        int token = _input.LA(tokenNum);
        if (token == EOF) return false; // handling EOF probably needs more work
        if (token == SELECT) return true;
        if (token == LPAREN) return isRuleBStart(tokenNum++);
        return false;
    }
}

And then in my grammar, I would do:

ruleA:
    {isRuleBStart(1)}? ruleB
    | ruleC
;

This appears problematic to me since:

  1. It involves recursion in a construct that is already rumored to be performance degrading
  2. ruleBStart could get much more complicated if the ruleBStart rule had an arbitrary set of different tokens to check instead of just repeating LPAREN
  3. It binds my code to target language. So if I wanted to publish a parser in Java and C++, I would have to re-implement this semantic predicate in both. (I know it is possible to carefully program the semantic predicate so the same code works in Java and C++, but that is not the point).

So I want to ask the community if there is a right Antlr4 way to achieve the same result.

Upvotes: 4

Views: 728

Answers (1)

Mike Lischke
Mike Lischke

Reputation: 53502

There's no need for a semantic predicate with ANTLR4. The ALL(*) algorithm will do an unlimited lookahead, if needed and hence doesn't need semantic predicates or any comparable hack.

So, just remove that predicate and everything should just work.

Upvotes: 1

Related Questions