Reputation: 139
I am writing a handwritten recursive descent parser as a self exercise. I'd like to see if an iterative approach is possible. Generally speaking, I'd like to know what mindset I should have in order to transform interdependent recursive functions into an iterative solution.
In my current minimal example, I have a list of Tokens (which is simply a type with a lexeme) and they are consumed via recursive descent to build an abstract syntax tree represented by a unique_ptr
to an expression:
#include <string>
#include <memory>
#include <vector>
enum Type { Num, Add, Mul, LParen, RParen };
struct Token {
Type type;
std::string lexeme;
};
struct Expr {
Token token;
};
using AST = std::unique_ptr<Expr>;
struct Literal : public Expr {
double value;
};
struct Grouping : public Expr {
AST inner;
};
struct Binary : public Expr {
AST lhs, rhs;
};
using CIT = std::vector<Token>::const_iterator;
auto expression(CIT& it, CIT end) -> AST;
auto literal(CIT &it, CIT end) -> AST {
if (it != end and it->type == Type::Num) {
auto value = std::stod(it->lexeme);
auto token = *it++;
return std::make_unique<Literal>(Literal{ token, value });
}
else if (it != end and it->type == Type::LParen) {
auto token = *it++;
auto ast = std::make_unique<Grouping>(Grouping{ token, expression(it, end) });;
if (it != end and it->type == Type::RParen)
return ast;
else
throw "Mismatched parenthesis";
}
throw "Unable to parse literal";
}
auto multiplication(CIT &it, CIT end) -> AST {
auto ast = literal(it, end);
while (it != end and it->type == Type::Mul) {
auto token = *it++;
ast = std::make_unique<Binary>(Binary{ token, std::move(ast), literal(it, end) });
}
return ast;
}
auto addition(CIT &it, CIT end) -> AST {
auto ast = multiplication(it, end);
while (it != end and it->type == Type::Add) {
auto token = *it++;
ast = std::make_unique<Binary>(Binary{ token, std::move(ast), multiplication(it, end) });
}
return ast;
}
auto expression(CIT &it, CIT end) -> AST {
return addition(it, end);
}
int main() {
std::vector<Token> tokens = {
{ Type::Num, "5"},
{ Type::Add, "+"},
{ Type::LParen, "("},
{ Type::Num, "4"},
{ Type::Mul, "*"},
{ Type::Num, "3"},
{ Type::RParen, ")"},
};
auto it = tokens.begin();
auto ast = expression(it, tokens.end());
}
Here there is a circular dependency of recursive calls: addition
depends on multiplication
, multiplication
depends on literal
, and literal
'depends on' addition
.
I'd like to see if there is a way to flatten these calls into a singular iterative call. My first thoughts were to loop through the tokens and do a switch case between the precedence of operators; however, I am unsure where to go come there.
Non-complete attempt:
auto parse(const std::vector<Token>& tokens) -> AST {
AST current;
enum class Precedent {
Addition, Multiplication, Literal
};
for (const auto& token : tokens) {
switch (precedent) {
case Precedent::Addition: {
???
} break;
case Precedent::Multiplication: {
???
} break;
case Precedent::Literal: {
???
} break;
}
}
return current;
}
I feel that I am missing some kind of stack as a lot of iterative solutions from recursive solutions appear to maintain a manual call-like stack.
Any hints would be appreciated.
Edit: I've reviewed the post referring to the duplicate, though I believe my question is different that the one linked. I am not trying to make a single recursive function into an iterative one, I am trying to make multiple recursive functions into a single iterative function. I hope that explains why I asked this question.
Upvotes: 2
Views: 295
Reputation: 241731
You definitely need a stack in order to parse nested expressions. Recursive-descent parsers use the call stack, which is convenient because it frees you from needing to implement a managed stack datatype; the cost is that since you are not managing the stack, you rely on the host-language runtime to detect stack overflow. And the host language you've chosen (C++) doesn't do that.
Predictive (top-down) parsing can easily be implemented with an explicit stack. The stack, which contains grammar symbols, represents unresolved predictions. Initially, it consists of the start symbol and the end-of-input token. (The start symbol is on top.)
The parse algorithm is extremely simple:
If the top of the stack is a terminal type, then the next token must be a token of that type.
If it is, the input token is processed and the stack is popped; if the stack is now empty, the parse is complete.
If the input token is not of the expected type, the parse fails.
If the top of the stack is a non-terminal symbol, an appropriate production for that symbol is selected based on a peek at the next input token. (Or tokens, if your parser requires more than one lookahead.)
If there is a possible production, the stack is popped, and the right-hand side of the predicted production is pushed right-to-left onto the stack (so that the first symbol in the right-hand side is on the top of the stack).
If no production for the predicted symbol can start with the current lookahead, the parse fails.
A slightly more complicated prediction algorithm must be used if there are empty productions; an empty production is predicted based on its FOLLOW set, not its FIRST set.
In practice, you'll want to actually do something with each recognized terminal. That generally requires another stack (or stacks) and roughly corresponds to evaluating a forward Polish (prefix) expression.
Upvotes: 2