Reputation: 146940
I've been writing some recursive ascent parsers, and one of the things I've been struggling with is left recursion. It seems obvious to me that right recursion can be expressed recursively, like
addExpr
: primaryExpr '+' addExpr
| primaryExpr;
by something along the lines of
parseAddExpr() {
auto x = parsePrimaryExpr();
if (next_token == '+') {
auto result = make_unique<PlusExpr>();
result->lhs = x;
result->rhs = parseAddExpr();
return std::move(result);
}
return std::move(x);
}
But for left recursion, all I could come up with is a while loop.
mulExpr
: mulExpr '*' addExpr
| addExpr;
parseMulExpr() {
auto expr = parseAddExpr();
while(next_token == '*') {
auto mul_expr = make_unique<MulExpr>();
mul_expr->lhs = std::move(expr);
mul_expr->rhs = parseAddExpr();
expr = std::move(mul_expr);
}
return std::move(expr);
}
I mean, this functions fine, but I always felt that it should have a recursive version. Is it possible to implement a left-associative operator in a recursive, instead of iterative, fashion?
Upvotes: 6
Views: 1493
Reputation: 385700
Your functions are recursive descent, not recursive ascent. The problem that recursive descent parsers have with left recursion, which you have encountered, is very well known and studied. Any compilers course or textbook that covers parsing will discuss the problem and ways to solve it.
Using iteration is a perfectly normal, valid way to handle it. See these course notes for example. In those notes, the rule T -> T '*' S | T '/' S | S
(which is your mulExpr
rule with division added) is transformed into the rule T -> S { ('*' | '/') S }
, where the braces {...}
mean “zero or more repetitions”.
Based on your comment, I think you have some confusion about what “recursive descent” means and what “recursive ascent” means.
The basic idea of recursive descent is to create one function for each nonterminal in the grammar. The job of the function corresponding to some nonterminal A is to completely parse one instance of A if possible, calling as necessary the functions for the nonterminals appearing on the right-hand side of A's productions in the grammar.
So, for example, your grammar has a nonterminal addExpr
with these two productions:
addExpr -> primaryExpr '+' addExpr
addExpr -> primaryExpr
Therefore a recursive descent parser will have a function for addExpr
that tries to match the right-hand side of one of the addExpr
productions, calling the functions for primaryExpr
and addExpr
(itself!) as necessary because those nonterminals appear in addExpr
's productions.
And indeed this is exactly what you have in your parseAddExpr
function. It looks for a way to match one of the addExpr
productions, calling parsePrimaryExpr
and parseAddExpr
as necessary.
Recursive ascent is a (very uncommon) way to implement LR parsing. An LR parser has a state table, with a row for each state and a column for each terminal. In a recursive ascent parser, we don't represent the table as data. Instead, we create one function for each state, and that state's row is embodied as a switch statement in the function.
In an LR parser, there is not normally a one-to-one correspondence between states and nonterminals, or between states and productions. Generally there will be more states than productions. Each state represents a set of positions within productions.
Looking at the functions in your post, I see no evidence that you've constructed or embodied a state table. What I see is a set of functions that correspond directly to the nonterminals of your grammar. That correspondence is the hallmark of recursive descent.
Also, the fact that you're encountering trouble with left recursive productions is a dead giveaway that you're using recursive descent. LR parsers do not have problems with left recursion and recursive descent parsers do.
Upvotes: 10