Reputation: 7960
I'm using the pest crate to implement a recursive grammar in Rust:
id = _{ ASCII_ALPHA_LOWER ~ (ASCII_ALPHANUMERIC|"_")* }
integer = _{ (ASCII_NONZERO_DIGIT ~ ASCII_DIGIT*)|"0" }
real = _{ ((integer ~ "." ~ ASCII_DIGIT*) | (integer? ~ "." ~ ASCII_DIGIT+)) ~ (("e"|"E") ~ ("-"|"+")? ~ ASCII_DIGIT+)? }
unaryop = _{ "sin"|"cos"|"tan"|"exp"|"ln"|"sqrt" }
inner_exp = _{ real|integer|"pi"|id }
exp = { SOI ~ ( inner_exp | (exp ~ ( "+"|"-"|"*"|"/"|"^" ) ~ inner_exp) | ("-" ~ exp) | ("(" ~ exp ~ ")") | (unaryop ~ "(" ~ exp ~ ")") ) ~ EOI }
However, I'm finding that pest isn't parsing the grammar as I would expect. For instance, 2+3
gives me an error of:
--> 1:2
|
1 | 2+3
| ^---
|
= expected EOI
It appears that the inner_exp
choice is being parsed and then, when the +
symbol is encountered, the parser doesn't know what to do. I'm pretty sure there's a problem with how I've written the exp ~ ( "+"|"-"|"*"|"/"|"^" ) ~ inner_exp
choice but I'm not sure what exactly is causing the problem. If I replace that choice with exp ~ ( "+"|"-"|"*"|"/"|"^" ) ~ exp
I get an error stating that the expression is left-recursive. How do I fix this grammar?
Upvotes: 2
Views: 2153
Reputation: 370082
The choice operator in PEGs is ordered and works as follows: Given e = {alt1 | alt2}
:
alt1
can be matched successfully, alt1
is applied and alt2
is never tried.alt2
is matchedalt2
can't match either, e
fails to matchNow e = {e1 ~ e2}
works as follows:
e1
can be matched and e2
can be matched after it, both are matched sequentially.e
fails to match.So if you have something like e = {(e1 | e2) ~ e3}
, the following will happen:
e1
can be matched:
e3
can be matched after e1
, both are matched sequentiallye
fails to matche1
fails to match, but e2
can be matched:
e3
can be matched after e2
, both are matched sequentiallye
fails to matchNotably if e1
succeeds and e3
fails, it does not go back and try to match e2
instead. So if both e1
and e2
can produce a match, but only e2
allows e3
to be matched afterwards, (e1 | e2) ~ e3
will fail whereas (e1 ~ e3) | (e2 ~ e3)
would succeed.
So in your grammar you have (inner_exp | ...) ~ EOI
. Now for your input inner_exp
produces a match, so per the above rules the other alternatives are never tried and it tries to match EOI
next. EOI
doesn't match, so the whole rule fails and you get the syntax error you get.
This explains the syntax error, but it's not the only problem your grammar has:
Your exp
rule is recursive, but it's anchored via SOI
and EOI
, so it can never match anything other than the entire input. This means that the recursive calls will necessarily always fail. To fix this you should remove SOI
and EOI
from the definition of exp
and instead have a main rule like start = {SOI ~ exp ~ EOI}
.
Once you've done this, you'll get an error that your exp
rule is now left-recursive, which pest does not support. To fix that, you can replace the left recursion with repetition like this (replacing both the inner_exp
and exp ~ (...) ~ inner_exp
alternatives) where operand
is a rule that matches the constructs other than infix operations:
operand ~ (( "+"|"-"|"*"|"/"|"^") ~ operand)*
Incidentally this will also fix your current issue because you now no longer have an inner_exp
alternative that's tried before the alternative for infix expressions.
Your last issue is that you're not taking operator precedence into account at all. You can fix that by introducing additional "levels" of expressions in addition to inner_exp
and exp
, so that only operators that have the same precedence are defined in the same rule and then each rule invokes the rule containing the next higher precedence to parse the operands. That would look like this:
exp = { summand ~ (("+" | "-") ~ summand)* }
summand = { factor ~ (("*" | "/" | "%") ~ factor)* }
factor = { unary ~ ("^" ~ unary)* }
unary = { "-" ~ unary | unaryop ~ "(" ~ exp ~ ")" | primary }
primary = { "(" ~ exp ~ ")" | real | integer | "pi" | id }
Upvotes: 6