Marplesoft
Marplesoft

Reputation: 6240

Unexpected parser rule matching order

With the following (subset of a) grammer for a scripting language:

expr
    ...
    | 'regex(' str=expr ',' re=expr ')'  #regexExpr
    ...

an expression like regex('s', 're') parses to the following tree which makes sense:

regexExpr
   'regex('
   expr: stringLiteral ('s')
   ','
   expr: stringLiteral ('re')
   ')'

I'm now trying to add an option third argument to my regex function, so I've used this modified rule:

'regex(' str=expr ',' re=expr (',' n=expr )? ')'

This causes regex('s', 're', 1) to be parsed in a way that's unexpected to me:

regexExpr
   'regex('
   expr:listExpression
      expr: stringLiteral ('s') 
      ','
      expr: stringLiteral ('re')
   ','
   expr: integerLiteral(1)
   ')'

where listExpression is another rule defined below regexExpr:

expr
    ...
    | 'regex(' str=expr ',' re=expr (',' n=expr)? ')' #regexExpr
    ...
    | left=expr ',' right=expr                        #listExpr
    ... 

I think this listExpr could have been defined better (by defining surrounding tokens), but I've got compatibility concerns with changing it now.

I don't understand the parser rule matching precedence here. Is there a way I can add the optional third arg to regex() without causing the first two args to be parsed as a listExpr?

Upvotes: 1

Views: 32

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

Try defining them in two separate alternatives and with the same label #regexExpr:

expr
 : 'regex' '(' str=expr ',' re=expr ',' n=expr ')' #regexExpr
 | 'regex' '(' str=expr ',' re=expr ')'            #regexExpr
 | left=expr ',' right=expr                        #listExpr
 | ...
 ;

Upvotes: 1

Related Questions