Reputation: 75659
I am trying to get operator precedence correct in a Treesitter grammar. Treesitter is a LR1 parser generator.
I have a straightforward artithmetic grammar, which partially looks like this:
multiply_expression: $ => prec.left(2, seq(
$._expression,
'*',
$._expression,
)),
addition_expression: $ => prec.left(1, seq(
$._expression,
'+',
$._expression,
)),
This works correctly. multiply_expression
indeed gets a higher precedence than addition_expression
.
However, the precedence changes when I add an intermediate rule:
_partial_multi: $ => seq(
$._expression,
'*',
),
multiply_expression: $ => prec.left(2, seq(
$._partial_multi,
$._expression,
)),
I moved $.expression, '*'
to its own rule. To me, this seems to be an equivalent grammar, and I expect no changes. However, with this change the precedence is no longer correct. addition_expression
, which remained unchanged, seems to have a higher precedence than multiply_expression
.
Why does introducing an extra step change the precedence? Is there a name for this problem, or where can I find more information about it? When writing a grammar or fixing precedence problems, are there rules to follow or ways to think about this?
Upvotes: 1
Views: 263
Reputation: 1789
Here is your full grammar, for reproducibility:
module.exports = grammar({
name: 'github_example',
conflicts: $ => [],
rules: {
source_file: $ => $._expression,
_expression: $ => choice(
$.number,
$.multiply_expression,
$.addition_expression
),
number: $ => /\d+/,
_partial_multi: $ => seq(
$._expression,
'*',
),
multiply_expression: $ => prec.left(2, seq(
$._partial_multi,
$._expression,
)),
addition_expression: $ => prec.left(1, seq(
$._expression,
'+',
$._expression,
)),
}
});
You can fix this issue by adding precedence to the _partial_multi
rule and removing left-associative precedence from the multiply_expression
rule:
_partial_multi: $ => prec(2, seq(
$._expression,
'*',
)),
multiply_expression: $ => seq(
$._partial_multi,
$._expression,
),
What you've done here is make multiplication a right-associative operator of precedence 2. This is how you define left- or right-associativity in grammars which don't expose it as a primitive. You can make multiplication left-associative by writing it as follows:
_partial_multi: $ => prec(2, seq(
'*',
$._expression,
)),
multiply_expression: $ => seq(
$._expression,
$._partial_multi,
),
You've actually stumbled across something quite interesting, which is that you don't need explicit language constructs to define precedence & associativity in a grammar! They're just "syntactic sugar" to make the grammar easier to read & write. See this page for more information on how to specify precedence & associativity by decomposing rules. You can see that specifying precedence & associativity purely through grammar constructs is confusing, and almost reads backwards from what you expect unless you think about it! As you also discovered, mixing these two approaches (specifying precedence & associativity through language constructs and grammar constructs) can lead to confusing behavior. It is best to stick with one or the other.
Upvotes: 2