mailmindlin
mailmindlin

Reputation: 624

Spaces required between keyword and literal

Looking at the output of UglifyJS2, I noticed that no spaces are required between literals and the in operator (e.g., 'foo'in{foo:'bar'} is valid).

Playing around with Chrome's DevTools, however, I noticed that hex and binary number literals require a space before the in keyword:

enter image description here

Internet explorer returned true to all three tests, while FireFox 48.0.1 threw a SyntaxError for the first one (1in foo), however it is okay with string literals ('1'in foo==true).

It seems that there should be no problem parsing JavaScript, allowing for keywords to be next to numeric literals, but I can't find any explicit rule in the ECMAScript specification (any of them).

Further testing shows that statements like for(var i of[1,2,3])... are allowed in both Chrome and FireFox (IE11 doesn't support for..of loops), and typeof"string" works in all three.

Which behavior is correct? Is it, in fact, defined somewhere that I missed, or are all these effects a result of idiosyncrasies of each browser's parser?

Upvotes: 3

Views: 608

Answers (2)

mailmindlin
mailmindlin

Reputation: 624

TL;DR There would be no change to how code would be interpreted if whitespace weren't required in these circumstances, but it's part of the spec.

Looking at the source code of v8 that handles number literal parsing, it cites ECMA 262 § 7.8.3:

The source character immediately following a NumericLiteral must not be an IdentifierStart or DecimalDigit.

NOTE For example:

3in

is an error and not the two input elements 3 and in.

This section seems to contradict the introduction of section 7. However, it does not seem that there would be any problems with breaking that rule and allowing for 3in to be parsed. There are cases where allowing for no spaces between literals and identifiers would change how the source is parsed, but all cases merely change which errors are generated.

Upvotes: 1

EML
EML

Reputation: 10281

Not an expert - I haven't done a JS compiler, but have done others.

ecma-262.pdf is a bit vague, but it's clear that an expression such as 1 in foo should be parsed as 3 input elements, which are all tokens. Each token is a CommonToken (11.5); in this case, we get numericLiteral, identifierName (yes, in is an identifierName), and identifierName. Exactly the same is true when parsing 0b1 in foo (see 11.8.3).

So, what happens when you take out the WS? It's not covered explicitly (as far as I can see), but it's common practice (in other languages) when writing a lexer to scan the longest character sequence that will match something you could potentially be looking for. The introduction to section 11 pretty much says exactly that:

The source text is scanned from left to right, repeatedly taking the longest possible sequence of code points as the next input element.

So, for 0b1in foo the lexer goes through 0b1, which matches a numeric literal, and reaches i, giving 0b1i, which doesn't match anything. So it passes the longest match (0b1) to the rest of the parser as a token, and starts again at i. It finds n, followed by WS, so passes in as the second token, and so on.

So, basically, and rather bizarrely, it looks like IE is correct.

Upvotes: 2

Related Questions