How does this Recursive Descent Parser match specific operators?

Question

I likely just need to step away from the computer for a moment, but I'm puzzled about one detail in how this recursive descent parser is working. It is in a book called Crafting Interpreters by Bob Nystrom (here) that I'm following along with. It is implementing parsing for expressions, taking advantage of the call stack to implement precedence. We start with the expression function, which expands to equality (the lowest level of precedence). This then immediately calls comparison (operators with the next level of precedence) and onward:

private Expr expression() {
    return equality();
}

private Expr equality() {
    Expr expr = comparison();

    // If we entered this loop, then we know we've found one of the two equality operators
    // Stick the operator into 'operator', so that we know which type of expression we have ('!=' or '==')
    while (match(NOT_EQUAL, EQUAL_EQUAL)) {
        Token operator = previous();
        Expr right = comparison();
        expr = new Expr.Binary(expr, operator, right);
    }

    return expr;
}

private Expr comparison() {
    Expr expr = term();

    while (match(GREATER_THAN, GREATER_EQUAL, LESS_THAN, LESS_EQUAL)) {
        Token operator = previous();
        Expr right = term();
        expr = new Expr.Binary(expr, operator, right);
    }

    return expr;
}

Which all makes sense and is perfectly fine. It keeps going through PLUS AND MINUS (term), factor, and unary expressions until we hit the lowest level, and the highest precedence, primary:

private Expr primary() {
    // Literals are all very straightforward - they just get their corresponding value
    if (match(FALSE)) {
        return new Expr.Literal(false);
    }

    if (match(TRUE)) {
        return new Expr.Literal(true);
    }

    if (match(NIL)) {
        return new Expr.Literal(null);
    }

    if (match(NUMBER, STRING)) {
        return new Expr.Literal(previous().literal);
    }

    // Consume until we find a ')'. An opening parenthesis without a matching closing parenthesis is an error.
    if (match(LEFT_PAREN)) {
        Expr expr = expression();
        consume(RIGHT_PAREN, "Expecting to find ')' after expression!");
        return new Expr.Grouping(expr);
    }

    // If we got right down to primary() and still nothing matches, then the token we have cannot possibly be the
    // start of an expression. In this case, throw an error.
    throw error(peek_this(), "Expected expression.");
}

If we still haven't matched an operator by this point, then we throw an error. Otherwise, we would have built the expression and returned it back up the call stack at some earlier point. This all makes sense, and it also works and runs absolutely fine.

My sticking point is this: how do we ever actually match any operator? Let's say for example that the current token is a GREATER_THAN ('>'). The function that should find it and build the new expression with it, comparison, calls term before doing that, so we end up just dropping straight down to primary. Obviously we aren't going to match '>' anywhere else, so how do we not end up just throwing an error for it? Or for any operator token, for that matter?

As I said above this does actually work as expected, but I'm probably just making one stupid mistake or missing a detail as to how. Normally I'll have an epiphany while trying to explain something in an SO post, but I haven't had it click yet so would much appreciate some explanation as to what is going on here. Thank you!

EDIT: I might understand now, but would like some confirmation. I think the key is that the unary function, which matches unary ! and - tokens, only calls primary after checking to see if we have one of those tokens:

private Expr unary() {
    if (match(EXCLAMATION, MINUS)) {
        Token operator = previous();
        Expr right = unary();
        return new Expr.Unary(operator, right);
    }

    return primary();
}

private Expr primary() {
    // Literals are all very straightforward - they just get their corresponding value
    if (match(FALSE)) {
etc...

Which makes sense, because you can't start an expression with '<=', '*', or anything else further down the precendence hierarchy. So, the tokens that unary and primary are tying to find are the only possible start-points for an expression, and if we don't find either of those initially, then that's an error. Am I correct?

How does this Recursive Descent Parser match specific operators?

Answers (1)

Related Questions