Reputation: 8621
Out of interest, i want to learn how to write a parser for a simple language, to ultimately write an interpreter for my own little code-golfing language, as soon as i understood how such things work in general.
So I started reading Douglas Crockfords article Top Down Operator Precedence.
Note: You should probably read the article if you want a deeper understanding of the context of the code snippets below
I have trouble understanding how the var
statement and the assignment operator =
should work together.
D.C. defines an assignment operator like
var assignment = function (id) {
return infixr(id, 10, function (left) {
if (left.id !== "." && left.id !== "[" &&
left.arity !== "name") {
left.error("Bad lvalue.");
}
this.first = left;
this.second = expression(9);
this.assignment = true;
this.arity = "binary";
return this;
});
};
assignment("=");
Note: [[value]] refers to a token, simplified to its value
Now if the expression function reaches e.g. [[t]],[[=]],[[2]]
,the result of [[=]].led
is something like this.
{
"arity": "binary",
"value": "=",
"assignment": true, //<-
"first": {
"arity": "name",
"value": "t"
},
"second": {
"arity": "literal",
"value": "2"
}
}
D.C. makes the assignment
function because
we want it to do two extra bits of business: examine the left operand to make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements.
Which makes sense to me up to the point where he introduces the
var
statement, which is defined as follows.
The var statement defines one or more variables in the current block. Each name can optionally be followed by = and an initializing expression.
stmt("var", function () {
var a = [], n, t;
while (true) {
n = token;
if (n.arity !== "name") {
n.error("Expected a new variable name.");
}
scope.define(n);
advance();
if (token.id === "=") {
t = token;
advance("=");
t.first = n;
t.second = expression(0);
t.arity = "binary";
a.push(t);
}
if (token.id !== ",") {
break;
}
advance(",");
}
advance(";");
return a.length === 0 ? null : a.length === 1 ? a[0] : a;
});
Now if the parser reaches a set of tokens like [[var]],[[t]],[[=]],[[1]]
the generated tree would look something like.
{
"arity": "binary",
"value": "=",
"first": {
"arity": "name",
"value": "t"
},
"second": {
"arity": "literal",
"value": "1"
}
}
The keypart of my question is the if (token.id === "=") {...}
part.
I don't understand why we call
t = token;
advance("=");
t.first = n;
t.second = expression(0);
t.arity = "binary";
a.push(t);
rather than
t = token;
advance("=");
t.led (n);
a.push(t);
in the ...
part.
which would call our [[=]]
operators led
function (the assignment function), which does
make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements. e.g
{
"arity": "binary",
"value": "=",
"assignment": true,
"first": {
"arity": "name",
"value": "t"
},
"second": {
"arity": "literal",
"value": "1"
}
}
since there is no operator with a lbp
between 0 and 10, calling expression(0) vs. expression (9)
makes no difference. (!(0<0) && !(9<0) && 0<10 && 9<10)
)
And the token.id === "="
condition prevents assignments to an object member as token.id
would either be '['
or '.'
and t.led
wouldn't be called.
My question in short is:
Why do we not call the, optionally after a variable declaration followable, assignment operators' available led
function. But instead manually set the first
and second
members of the statement but not the assignment
member ?
Here are two fiddles parsing a simple string. Using the original code and one using the assignment operators led
.
Upvotes: 26
Views: 1476
Reputation: 276306
When parsing a language, two things matter - Semantics and Syntax.
Semantically, var x=5;
and var x;x=5
seem very close if not identical (Since in both cases first a variable is declared and then a value is assigned to that declared variable. This is what you've observed and is correct for the most part.
Syntactically however, the two differ (which is clearly visible).
In natural language, an analogue would be:
Now to be concise! Let's look at the two examples.
While the two (pretty much) mean the same thing, they are clearly not the same sentence. Back to JavaScript!
The first one: var x=5
is read the following way:
var x = 5
-----------------------VariableStatement--------------------
var ------------------- VariableDeclarationList
var ------------------- VariableDeclaration
var Identifier ------- Initialiser(opt)
var ------------------- x = AssignmentExpression
var ------------------- x ------------ = LogicalORExpression
var ------------------- x ------------ = LogicalANDExpression
var ------------------- x ------------ = BitwiseORExpression
var ------------------- x ------------ = BitwiseXORExpression
var ------------------- x ------------ = BitwiseANDExpression
var ------------------- x ------------ = EqualityExpression
var ------------------- x ------------ = ShiftExpression
var ------------------- x ------------ = AdditiveExpression
var ------------------- x ------------ = MultiplicativeExpression
var ------------------- x ------------ = UnaryExpression
var ------------------- x ------------ = PostfixExpression
var ------------------- x ------------ = NewExpression
var ------------------- x ------------ = MemberExpression
var ------------------- x ------------ = PrimaryExpression
var ------------------- x ------------ = Literal
var ------------------- x ------------ = NumericLiteral
var ------------------- x ------------ = DecimalLiteral
var ------------------- x ------------ = DecimalDigit
var ------------------- x ------------ = 5
Phew! All this had to happen syntactically to parse var x = 5
, sure, a lot of it is handling expressions - but it is what it is, let us check the other version.
This breaks into two statements. var x; x = 5
The first one is:
var x
--------VariableStatement---
var ---- VariableDeclarationList
var ---- VariableDeclaration
var Idenfifier (optional initializer not present)
var x
The second part is x=5
which is an assignment statement. I can go on with the same expression madness - but it's pretty much the same.
So in conclusion, while the two produce the same result semantically, syntactically as the official language grammar specifies - they are different. The result, in this case - is indeed the same.
Upvotes: 8
Reputation: 13551
Assignment (e. g. var t; t = 1;
) is conceptually different from initialization (e. g. var t = 1;
), although both result in memory state change. Using the same piece of code to implement both is not desirable as one could change independently of the other in a future version of the language.
The conceptual difference can be shown on C++ when talking about assignment operator overloading and copy constructors. Initialization can invoke copy constructor, assignment could invoke assignment operator overload. Assignment never triggers copy constructor, initialization never makes use of assignment operator overload. See tutorial on copy constructor and assignment operator overloading.
Another example is the one by Strix: by far not all l-values can be used after var
in JavaScript. I think this is the biggest difference between them in JavaScript, if not the only one. Ignoring the obvious scoping change in var, of course.
One could think of the use of the equals sign for both as a coincidence. Pascal uses :=
for assignment and =
for initialization. JavaScript could as well use something like var t : 1;
.
Upvotes: 1
Reputation: 1114
I don't have time to read the whole article, so I am not hundred percent sure. In my opinion the reason is because the assignment operator in var
statement is a bit special. It doesn't accept all possible left values - no members of an object are allowed (no .
or [
operators). Only plain variable names are allowed.
So we can't use normal assignment
function because it allows all left values.
I am quite sure about this, but the following is just a guess:
We would have to call assignment
function optionally and only after we checked that we consumed the assignment operator.
advance();
if (token.id === "=") {
// OK, Now we know that there is an assignment.
But the function assignment
assumes that current token is a left value, not operator =
.
I have no idea why the assignment
member is not set to true
. It depends on what you want to do with the generated tree. Again, assignment in var
statement is a bit special and it might not be feasible to set it.
Upvotes: 1