Riaz
Riaz

Reputation: 69

can antlr semantic predicates access grammar symbols

The antlr book has the following sample code to resolve grammar ambiguities using semantic predicates:

// predicates/PredCppStat.g4
@parser::members {
  Set<String> types = new HashSet<String>() {{add("T");}};
  boolean istype() { return types.contains(getCurrentToken().getText());}
}
stat:   decl ';'  {System.out.println("decl "+$decl.text);}
    |   expr ';'  {System.out.println("expr "+$expr.text);}
    ;
decl:   ID ID
    |   {istype()}? ID '(' ID ')'
    ;
expr:   INT
    |   ID
    |   {!istype()}? ID '(' expr ')'
    ;
ID  :   [a-zA-Z]+ ;
INT :   [0-9]+ ;
WS  :   [ \t\n\r]+ -> skip ;

Here, the predicate is the first function called in a rule, determining whether the rule should be fired or not. And it uses getCurrentToken() to take its decision.

However, if we alter the grammar slightly, to use hierarchical names instead of simple ID, like this:

decl:   ID ID
    |   {istype()}? hier_id '(' ID ')'
    ;
expr:   INT
    |   ID
    |   {!istype()}? hier_id '(' expr ')'
    ;
hier_id : ID ('.' ID)* ;

Then the istype() predicate can no longer use getCurrentToken to take its decision. It will need the entire chain of tokens in the hier_id to determine whether the chain is a type symbol or not.

That means, that we will need to do one of the following:

(1) put the predicate after hier_id, and access these value from istype(). Is this possible? I tried it, and I am getting compiler errors on the generated code.

(2) break up the grammar into sub-rules, and then place istype() after hier_id tokens are consumed. But this will wreck the readability of the grammar, and I would not like to do it.

What is the best way to solve this problem? I am using antlr-4.6.

Upvotes: 1

Views: 215

Answers (1)

R71
R71

Reputation: 4523

One solution is to make ID itself to contain '.', thereby making hier_id a lexer token. In that case, the semantic predicate's call to getCurrentToken() will have access to the full chain of names.

Note that hier_id will subsume ID if it becomes a lexer token. And that comes at a cost. If the grammar has other references to ID only (and I guess it will have), then you have to add predicates in all those situations to avoid false matches. This will slow down the parser.

So I guess the question, in its general sense (ie how can rules be restricted by pedicates if the currentToken information is not enough to make the decision), still needs to be answered by Antlr4 experts.

Upvotes: 0

Related Questions