Maks Karashchuk
Maks Karashchuk

Reputation: 13

Lexer rule is recognized where it wasn't needed

trying to use ANTLR 4 to create a simple grammar for some Select statements in Oracle DB. And faced a small problem. I have the following grammar:

Grammar & Lexer

column
: (tableAlias '.')? IDENT ((AS)? colAlias)?
| expression ((AS)? colAlias)?
| caseWhenClause ((AS)? colAlias)?
| rankAggregate ((AS)? colAlias)?
| rankAnalytic colAlias
;

colAlias
: '"' IDENT '"'
| IDENT
;

rankAnalytic
: RANK '(' ')' OVER '(' queryPartitionClause orderByClause ')'
;

RANK: R A N K;
fragment A:('a'|'A');
fragment N:('n'|'N');
fragment R:('r'|'R');
fragment K:('k'|'K');

The most important part there is in COLUMN declaration rankAnalytic part. I declared that after Rank statement should be colAlias, but in case this colAlias is called like "rank" (without quotes) it's recognized as a RANK lexer rule, but not as colAlias.

So for example in case I have the following text:

 SELECT fulfillment_bundle_id, SKU, SKU_ACTIVE, PARENT_SKU, SKU_NAME, LAST_MODIFIED_DATE,
 RANK() over (PARTITION BY fulfillment_bundle_id, SKU, PARENT_SKU 
 order by ACTIVE DESC NULLS LAST,SKU_NAME) rank

"rank" alias will be underlined and marked as an mistake with the following error:
mismatched input 'rank' expecting {'"', IDENT}
But the point is that I don't want it to be recognized as a RANK lexer word, but only rank as an alias for Column.
Open for your suggestions :)

Upvotes: 1

Views: 54

Answers (1)

GRosenberg
GRosenberg

Reputation: 6001

The RANK rule apparently appears above the IDENT rule, so the string "rank" will never be emitted by the lexer as an IDENT token.

A simple fix is to change the colAlias rule:

colAlias
    : '"' ( IDENT | RANK ) '"'
    | ( IDENT | RANK ) 
    ;

OP added:

Ok but in case I have not only RANK as a lexer rule but the whole list (>100) of such key words... What am I supposed to do?

If colAlias can be literally anything, then let it:

colAlias
    : '"' .+? '"'    // must quote if multiple
    | .              // one token
    ;

If that definition would incur ambiguities, a predicate is needed to qualify the match:

colAlias
    : '"' m+=.+? '"' { check($m) }?  // multiple
    | o=.            { check($o) }?  // one 
    ;

Functionally, the predicate is just another element in the subrule.

Upvotes: 1

Related Questions