Reputation: 16877

avoid stackoverflow in recursive algorithm in recursive-descent parser

I worked on a parser-related project and I implemented it using recursive descent parser. The problem however, it could easily cause stack overflow. What are the techniques to handle this type of problem?

For illustration, here's simple math expression parser, with addition, subtraction, multiplication and division support. Grouping parentheses could be used, and they trigger recursion obviously.

here's full code:

#include <string>
#include <list>
#include <iostream>

using namespace std;

struct term_t;
typedef list<term_t> prod_t;
typedef list<prod_t> expr_t;
struct term_t
{
    bool div;
    double value;
    expr_t expr;
};

double eval(const expr_t &expr);
double eval(const term_t &term)
{
    return !term.expr.empty() ? eval(term.expr) : term.value;
}
double eval(const prod_t &terms)
{
    double ret = 1;
    for (const auto &term : terms)
    {
        double x = eval(term);
        if (term.div)
            ret /= x;
        else
            ret *= x;
    }
    return ret;
}
double eval(const expr_t &expr)
{
    double ret = 0;
    for (const auto &prod : expr)
        ret += eval(prod);
    return ret;
}

class expression
{
public:
    expression(const char *expr) : p(expr)
    {
        prod();
        for (;;)
        {
            ws();
            if (!next('+') && *p != '-') // treat (a-b) as (a+-b)
                break;
            prod();
        }
    }
    operator const expr_t&() const
    {
        return expr;
    }

private:
    void term()
    {
        expr.back().resize(expr.back().size() + 1);
        term_t &t = expr.back().back();
        ws();
        if (next('('))
        {
            expression parser(p);  // recursion
            p = parser.p;
            t.expr.swap(parser.expr);
            ws();
            if (!next(')'))
                throw "expected ')'";
        }
        else
            num(t.value);
    }
    void num(double &f)
    {
        int n;
        if (sscanf(p, "%lf%n", &f, &n) < 1)
            throw "cannot parse number";
        p += n;
    }
    void prod()
    {
        expr.resize(expr.size() + 1);
        term();
        for (;;)
        {
            ws();
            if (!next('/') && !next('*'))
                break;
            term();
        }
    }
    void ws()
    {
        while (*p == ' ' || *p == '\t')
            ++p;
    }
    bool next(char c)
    {
        if (*p != c)
            return false;
        ++p;
        return true;
    }

    const char *p;
    expr_t expr;
};

int main()
{
    string expr;
    while (getline(cin, expr))
        cout << "= " << eval(expression(expr.c_str())) << endl;
}

if you run, you can type simple math expressions like 1+2*3+4*(5+6*7) and will correctly calculate 195. I also added simple expression evaluation, it also causes recursion and causes stack overflow even easier than parsing. Anyways, the parsing itself is simple and obvious, how can I rewrite it without making huge changes in the code and avoid recursion completely? In my case, I use expression similar to this one (((((1))))) to cause recursion and if I have only a couple of hundred parentheses I'll get stack overflow. If I step through with debugger (in Visual Studio) recursion tree if only three functions: [term ->] expression ctor -> prod -> term and from register inspection these three functions take 700-1000 bytes of stack space. With optimization settings and fiddling code a bit I can make it take less, and with compiler settings I can increase stack space, or in this case I could also use Dijksta's shunting-yard algorithm but that's not the point of the question: I want to know how to rewrite it to avoid recursion and at the same time, if possible, without completely rewriting parsing code.

Upvotes: 6

Answers (3)

Matt Timmermans

Reputation: 59253

The common practice for recursive descent parsers is to recurse into subexpressions, non-terminals, or nested constructs, but not to use recursion to continue parsing at the same level. This makes the stack size a limit on the maximum "depth" of the string you can parse, but not on its length.

It looks like you did that part right, so lets look at the typical numbers...

Because of the stack-based limitation, the recursive parsing functions are usually written so that they don't use a lot of stack -- 128 bytes or so would be a high average.

So if you have, say, 128K of stack space (which will usually mean your stack is already 90% full), then you should be able to get 1000 levels or so, and that is plenty for real world texts that programmers actually type.

In your case, you are getting only 200 levels on the stack. This is probably also OK for real life, but unless you are running in a very restricted hardware environment, it indicates that you're just using too much stack space in your recursive functions.

I don't know the size of the whole class, but I would guess that the main problem is in term() where you put a whole new expression on the stack with the expression parser(p); declaration. This is very unusual and looks like it could take a lot of space. You should probably avoid making this whole new object.

Print out sizeof(expression) to see how big this really is.

Upvotes: 5

rici

Reputation: 241871

A recursive-descent parser is necessarily recursive; the name is not capricious.

If a production is right-recursive then its corresponding recursive-descent action is tail-recursive. So with an appropriate grammar, you could produce a tail-recursive parser, but the production for parenthesized expressions will be difficult to shoehorn into that constraint. (And see below.)

You can simulate recursion by maintaining a simulated call-stack but the stack-manipulation will probably overwhelm the simplicity of the recursive descent parser. In any case, there are simpler iterative algorithms which use an explicit parse stack, so it might make more sense to use one of them. But that would not answer the question.

NOTE: If you use C++, then you have to jump through some hoops to create a tail context. In particular, if you allocate an object with a non-trivial destructor (such as std::list), then the automatic destructor call occurs in the tail context and the last explicit function call is not a tail call.

Upvotes: 4

gnasher729

Reputation: 52602

For parsing expressions, have a look at operator precedence parsing, for example http://epaperpress.com/oper/download/OperatorPrecedenceParsing.pdf . It parses expressions in a simple loop using a data stack. The only space needed for 200 nested parentheses is 200 entries in a data stack.

There are languages where new operators can be added at runtime, with the compiled program specifying associativity and precedence of those operators, something that couldn't be handled with a recursive decent parser.

Upvotes: 1

avoid stackoverflow in recursive algorithm in recursive-descent parser

Answers (3)

Related Questions