Reputation: 373
I likely just need to step away from the computer for a moment, but I'm puzzled about one detail in how this recursive descent parser is working. It is in a book called Crafting Interpreters by Bob Nystrom (here) that I'm following along with. It is implementing parsing for expressions, taking advantage of the call stack to implement precedence. We start with the expression
function, which expands to equality
(the lowest level of precedence). This then immediately calls comparison
(operators with the next level of precedence) and onward:
private Expr expression() {
return equality();
}
private Expr equality() {
Expr expr = comparison();
// If we entered this loop, then we know we've found one of the two equality operators
// Stick the operator into 'operator', so that we know which type of expression we have ('!=' or '==')
while (match(NOT_EQUAL, EQUAL_EQUAL)) {
Token operator = previous();
Expr right = comparison();
expr = new Expr.Binary(expr, operator, right);
}
return expr;
}
private Expr comparison() {
Expr expr = term();
while (match(GREATER_THAN, GREATER_EQUAL, LESS_THAN, LESS_EQUAL)) {
Token operator = previous();
Expr right = term();
expr = new Expr.Binary(expr, operator, right);
}
return expr;
}
Which all makes sense and is perfectly fine. It keeps going through PLUS
AND MINUS
(term
), factor
, and unary
expressions until we hit the lowest level, and the highest precedence, primary
:
private Expr primary() {
// Literals are all very straightforward - they just get their corresponding value
if (match(FALSE)) {
return new Expr.Literal(false);
}
if (match(TRUE)) {
return new Expr.Literal(true);
}
if (match(NIL)) {
return new Expr.Literal(null);
}
if (match(NUMBER, STRING)) {
return new Expr.Literal(previous().literal);
}
// Consume until we find a ')'. An opening parenthesis without a matching closing parenthesis is an error.
if (match(LEFT_PAREN)) {
Expr expr = expression();
consume(RIGHT_PAREN, "Expecting to find ')' after expression!");
return new Expr.Grouping(expr);
}
// If we got right down to primary() and still nothing matches, then the token we have cannot possibly be the
// start of an expression. In this case, throw an error.
throw error(peek_this(), "Expected expression.");
}
If we still haven't matched an operator by this point, then we throw an error. Otherwise, we would have built the expression and returned it back up the call stack at some earlier point. This all makes sense, and it also works and runs absolutely fine.
My sticking point is this: how do we ever actually match any operator? Let's say for example that the current token is a GREATER_THAN
('>
'). The function that should find it and build the new expression with it, comparison
, calls term
before doing that, so we end up just dropping straight down to primary
. Obviously we aren't going to match '>
' anywhere else, so how do we not end up just throwing an error for it? Or for any operator token, for that matter?
As I said above this does actually work as expected, but I'm probably just making one stupid mistake or missing a detail as to how. Normally I'll have an epiphany while trying to explain something in an SO post, but I haven't had it click yet so would much appreciate some explanation as to what is going on here. Thank you!
EDIT: I might understand now, but would like some confirmation. I think the key is that the unary
function, which matches unary !
and -
tokens, only calls primary
after checking to see if we have one of those tokens:
private Expr unary() {
if (match(EXCLAMATION, MINUS)) {
Token operator = previous();
Expr right = unary();
return new Expr.Unary(operator, right);
}
return primary();
}
private Expr primary() {
// Literals are all very straightforward - they just get their corresponding value
if (match(FALSE)) {
etc...
Which makes sense, because you can't start an expression with '<='
, '*'
, or anything else further down the precendence hierarchy. So, the tokens that unary
and primary
are tying to find are the only possible start-points for an expression, and if we don't find either of those initially, then that's an error. Am I correct?
Upvotes: 2
Views: 471
Reputation: 16375
First, the book, "Crafting Interpreters", by Robert Nystrom is great! Hands on, I am following it through as well and love it.
To understand the recursive decent Parser in a high level view is not too difficult, given we got the grammar, and deduct the code from there. We can happily hack it into the machine, much like a code monkey.
However, following a real example, step by step, makes it more difficult to follow in detail what is happening. In my case, I can only just grasp a few lines of code that are in front of me and then forget again. Like where to return to and what the state is...
Spoiler alert: The first line in the comparision()
method...
Expr expr = term();
...moves the "reading head" to the next token, GREATER
, while descending. For a detailed walk-through, read on...
By taking advantage of the call-stack to walk through the token list, we cannot easily tell from the code where the "reading head" of the parser is at a certain point of time. The recursion obscures this even more.
To your question: "how do we ever actually match any operator?".
Given the comparision()
method from the book:
private Expr comparison() {
Expr expr = term();
while (match(GREATER, GREATER_EQUAL, LESS, LESS_EQUAL)) {
Token operator = previous();
Expr right = term();
expr = new Expr.Binary(expr, operator, right);
}
return expr;
}
Let's assume we are somewhere in the middle of a list of input tokens
:
...
NUMBER 5
GREATER
NUMBER 3
...
From the grammar, we know that a comparision yields either a term only, or a term followed by a comparision operator, followed by a term (and that possibly multiple times):
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
When we enter the comparision method, the current token is NUMBER 5
. This is the "left-side" in the above grammar rule and is evaluated in:
Expr expr = term();
Whatever exactly happens down within term()
here, does not matter, afterall we get back an expression that is "compatible" with term
.
But wait! While evaluating term()
, the Literal with value 5 is found down the line (probably in primary()
)! When that happens, the advance()
method is called and the "reading head" is moved forward by one (current++
)!
Bang! Therefore, the current token after the call to term()
is now GREATER
and not NUMBER 5
...
It is kind of like the code moved the datastructure on without us as a witness. I guess this is the point that is easy to miss here...
To complete the example: Now with the current token GREATER
, we test whether to enter the while loop or not:
while (match(GREATER, GREATER_EQUAL, LESS, LESS_EQUAL)) {
In our case, match()
returns true and we enter into it. Down in match()
, if true, the advance()
method is called. It moves the "reading head" to the next token, NUMBER 3
. (Practically speaking: It calls current++
.)
We now got the Literal 5.0 in the variable expr
and set operator
to the previous token, which is still GREATER
(We do this only now, cause, well..., now we know!).
Token operator = previous();
No recursion happens here, so, no headaches. The next line evaluates the current token, now NUMBER 3
, to the Literal 3 and sets it to the variable right
.
Expr right = term();
This call recurses again but we don't need to worry, we simply wait for how ever many steps are being run all around us. Eventually, from our viewpoint instantly, the variable right
contains a term.
We put expr
, operator
, and right
together to form a Binary expression and return it. With this, "our" part, the example, is done.
(It also caused me some headaches and it took me a while to trace it all down. I did so with the example from the book: -123 * (45.67)
. You can have a look here, if it helps.)
Upvotes: 1