Moritz Roessler
Moritz Roessler

Reputation: 8621

Crockfords Top Down Operator Precedence

Out of interest, i want to learn how to write a parser for a simple language, to ultimately write an interpreter for my own little code-golfing language, as soon as i understood how such things work in general.

So I started reading Douglas Crockfords article Top Down Operator Precedence.

Note: You should probably read the article if you want a deeper understanding of the context of the code snippets below

I have trouble understanding how the var statement and the assignment operator = should work together.

D.C. defines an assignment operator like

var assignment = function (id) {
    return infixr(id, 10, function (left) {
        if (left.id !== "." && left.id !== "[" &&
                left.arity !== "name") {
            left.error("Bad lvalue.");
        }
        this.first = left;
        this.second = expression(9);
        this.assignment = true;
        this.arity = "binary";
        return this;
    });
};
assignment("=");  

Note: [[value]] refers to a token, simplified to its value

Now if the expression function reaches e.g. [[t]],[[=]],[[2]],the result of [[=]].led is something like this.

{
    "arity": "binary",
    "value": "=",
    "assignment": true, //<-
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "2"
    }
}

D.C. makes the assignment function because

we want it to do two extra bits of business: examine the left operand to make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements.

Which makes sense to me up to the point where he introduces the var statement, which is defined as follows.

The var statement defines one or more variables in the current block. Each name can optionally be followed by = and an initializing expression.

stmt("var", function () {
    var a = [], n, t;
    while (true) {
        n = token;
        if (n.arity !== "name") {
            n.error("Expected a new variable name.");
        }
        scope.define(n);
        advance();
        if (token.id === "=") {
            t = token;
            advance("=");
            t.first = n;
            t.second = expression(0);
            t.arity = "binary";
            a.push(t);
        }
        if (token.id !== ",") {
            break;
        }
        advance(",");
    }
    advance(";");
    return a.length === 0 ? null : a.length === 1 ? a[0] : a;
});

Now if the parser reaches a set of tokens like [[var]],[[t]],[[=]],[[1]] the generated tree would look something like.

{
    "arity": "binary",
    "value": "=",
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "1"
    }
}

The keypart of my question is the if (token.id === "=") {...} part.

I don't understand why we call

    t = token;
    advance("=");
    t.first = n;
    t.second = expression(0);
    t.arity = "binary";
    a.push(t);

rather than

    t = token;
    advance("=");
    t.led (n);
    a.push(t);  

in the ... part.

which would call our [[=]] operators led function (the assignment function), which does

make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements. e.g

{
    "arity": "binary",
    "value": "=",
    "assignment": true,
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "1"
    }
}

since there is no operator with a lbp between 0 and 10, calling expression(0) vs. expression (9) makes no difference. (!(0<0) && !(9<0) && 0<10 && 9<10))

And the token.id === "=" condition prevents assignments to an object member as token.id would either be '[' or '.' and t.led wouldn't be called.

My question in short is:

Why do we not call the, optionally after a variable declaration followable, assignment operators' available led function. But instead manually set the first and second members of the statement but not the assignment member ?

Here are two fiddles parsing a simple string. Using the original code and one using the assignment operators led.

Upvotes: 26

Views: 1476

Answers (3)

Benjamin Gruenbaum
Benjamin Gruenbaum

Reputation: 276306

When parsing a language, two things matter - Semantics and Syntax.

Semantically, var x=5; and var x;x=5 seem very close if not identical (Since in both cases first a variable is declared and then a value is assigned to that declared variable. This is what you've observed and is correct for the most part.

Syntactically however, the two differ (which is clearly visible).

In natural language, an analogue would be:

  • The boy has an apple.
  • There is an apple, the boy has it.

Now to be concise! Let's look at the two examples.

While the two (pretty much) mean the same thing, they are clearly not the same sentence. Back to JavaScript!

The first one: var x=5 is read the following way:

var                      x              =                  5
-----------------------VariableStatement--------------------
var -------------------        VariableDeclarationList 
var -------------------        VariableDeclaration
var            Identifier -------   Initialiser(opt)
var ------------------- x              = AssignmentExpression
var ------------------- x ------------ = LogicalORExpression
var ------------------- x ------------ = LogicalANDExpression
var ------------------- x ------------ = BitwiseORExpression
var ------------------- x ------------ = BitwiseXORExpression
var ------------------- x ------------ = BitwiseANDExpression 
var ------------------- x ------------ = EqualityExpression
var ------------------- x ------------ = ShiftExpression
var ------------------- x ------------ = AdditiveExpression
var ------------------- x ------------ = MultiplicativeExpression
var ------------------- x ------------ = UnaryExpression
var ------------------- x ------------ = PostfixExpression 
var ------------------- x ------------ = NewExpression
var ------------------- x ------------ = MemberExpression
var ------------------- x ------------ = PrimaryExpression
var ------------------- x ------------ = Literal
var ------------------- x ------------ = NumericLiteral
var ------------------- x ------------ = DecimalLiteral
var ------------------- x ------------ = DecimalDigit 
var ------------------- x ------------ = 5

Phew! All this had to happen syntactically to parse var x = 5 , sure, a lot of it is handling expressions - but it is what it is, let us check the other version.

This breaks into two statements. var x; x = 5 The first one is:

var                      x 
--------VariableStatement---
var ---- VariableDeclarationList 
var ---- VariableDeclaration
var                 Idenfifier (optional initializer not present)
var                      x

The second part is x=5 which is an assignment statement. I can go on with the same expression madness - but it's pretty much the same.

So in conclusion, while the two produce the same result semantically, syntactically as the official language grammar specifies - they are different. The result, in this case - is indeed the same.

Upvotes: 8

Palec
Palec

Reputation: 13551

Assignment (e. g. var t; t = 1;) is conceptually different from initialization (e. g. var t = 1;), although both result in memory state change. Using the same piece of code to implement both is not desirable as one could change independently of the other in a future version of the language.

The conceptual difference can be shown on C++ when talking about assignment operator overloading and copy constructors. Initialization can invoke copy constructor, assignment could invoke assignment operator overload. Assignment never triggers copy constructor, initialization never makes use of assignment operator overload. See tutorial on copy constructor and assignment operator overloading.

Another example is the one by Strix: by far not all l-values can be used after var in JavaScript. I think this is the biggest difference between them in JavaScript, if not the only one. Ignoring the obvious scoping change in var, of course.

One could think of the use of the equals sign for both as a coincidence. Pascal uses := for assignment and = for initialization. JavaScript could as well use something like var t : 1;.

Upvotes: 1

Strix
Strix

Reputation: 1114

I don't have time to read the whole article, so I am not hundred percent sure. In my opinion the reason is because the assignment operator in var statement is a bit special. It doesn't accept all possible left values - no members of an object are allowed (no . or [ operators). Only plain variable names are allowed.

So we can't use normal assignment function because it allows all left values.

I am quite sure about this, but the following is just a guess:

We would have to call assignment function optionally and only after we checked that we consumed the assignment operator.

  advance();
  if (token.id === "=") {
      // OK, Now we know that there is an assignment.

But the function assignment assumes that current token is a left value, not operator =.


I have no idea why the assignment member is not set to true. It depends on what you want to do with the generated tree. Again, assignment in var statement is a bit special and it might not be feasible to set it.

Upvotes: 1

Related Questions