IC_
IC_

Reputation: 1789

Please, help me to understand parsing trees example from craftinginterpreters.com

I'm trying to build a C compiler from scratch. Googling took me to the https://craftinginterpreters.com/ site. There are a lot of explanations how source code parsers work in detail. I've reached "Parsing expressions" chapter and stuck in this.
They offer that algorithm of creating parse tree (recursive descend)

Expr expression() {
    Expr left = equality();
    ...
}

Expr equality() {
    Expr left = comparison();
    ...
}

Expr comparison() {
    Expr left = term();
    ...
}

Expr term() {
    Expr left = factor();
    ...
}

Expr factor() {
    Expr left = unary();
    ...
}

Expr unary() {
    if (WeFoundToken('-')) {
        return Expr.Unary(prev(), unary());
    }
    return primary; // if there's no '-'
}

But what happens when we parse that simple expression: 1 * 2 - 3
I cannot understand how that's going to be working because top-level expression() when called falls through all lower-precedence operators parsing functions to the lowest precedence unary() function which will iterate through tokens, find (-) and return it to us as a unary (-) operation (-3) expression
In that case, there will be no way to build a valid tree that would look like this:

   -
 *   3
1 2

That would be invalid without a root node:

 *  
1 2  (-3)

Upvotes: 0

Views: 85

Answers (2)

rici
rici

Reputation: 241671

unary() function which will iterate through tokens

No, it doesn't. It looks only at the next input token and sees that it is not a -. So it returns primary().

The actual code from the page your linked in your question is:

  private Expr unary() {
    if (match(BANG, MINUS)) {
      Token operator = previous();
      Expr right = unary();
      return new Expr.Unary(operator, right);
    }

    return primary();
  }

That function calls match, not WeFoundToken. (I'm not sure where that comes from). match is defined earlier on the page:

  private boolean match(TokenType... types) {
    for (TokenType type : types) {
      if (check(type)) {
        advance();
        return true;
      }
    }

    return false;
  }

The loop in that function (for (TokenType type : types)) loops over the arguments to the call, comparing each one in turn with the next input token. It never looks at any other input token.

Upvotes: 1

General Grievance
General Grievance

Reputation: 4998

Let's step through the example expression 1 * 2 - 3. The parser starts at the beginning.

  • The first token matches 1 in primary. Consume 1.
  • Control returns to factor, expr is set to 1. The while condition then matches the *. Consume *.
    • In the while loop, it tries to consume a unary next, this successfully matches primary 2. Consume 2.
    • Expression is set to 1 * 2. No more [*/] to consume, loop terminates. Return this expression to term.
  • term enters while loop and sees -. Consumed - (meaning the next token is 3, not -)
    • Tries to consume a factor, which successfully matches 3 in primary. Consume the 3.
    • Expression is set to 1 * 2 - 3.

This results in the tree:

    -
  *   3
 1 2

In other words, because term has already consumed 1 * 2 as a factor, term will enter the while loop, not call factor again. This successfully recognizes the - as an operator in term instead of part of a unary expression.

Upvotes: 2

Related Questions