Learn on hard way
Learn on hard way

Reputation: 336

Way to tell JS that im using NUMBERS (optimisation)

Seems like JS freaks up when sees + in between two things. Does not know the type so it has to find the type. I have run into this guy (video with time stamp) who gives example:

x = x | 0;
y = y | 0;

to tell JS that we are using numbers. I need bit of explanation how it works under the hood, or at least in layman's terms.

If that is the way to tell JS that x and y are numbers, how can we tell JS that variables are strings, bools etc... using this bitwise operators. Also, why this is not used more often in everyday applications?

Upvotes: 1

Views: 98

Answers (2)

Nickolay
Nickolay

Reputation: 32063

The snippet you posted is on a slide called "asm.js" and is part of a larger example:

function Module() {
  "use asm";
  function add(x, y) {
    x = x | 0;
    y = y | 0;
    return x + y;
  }
  return {add: add};
}

Note the "use asm" annotation - it was introduced by Mozilla as a way to let the JS engine know this code is intended to conform to the asm.js specification, which defines a restricted subset of JavaScript, that can be compiled into efficient machine code using an "ahead-of-time" compiler.

Aside: since asm.js introduction in 2013, browser vendors have agreed on WebAssembly, a new standard which solves the same problem without using weird JavaScript syntax.

Regular JavaScript has dynamic typing, meaning you can write function add(x, y) { return x + y; } and then use it with parameters of different types, like:

add(1,2) /* =3 */
add("Hello ", "world") /* ="Hello world" */
// or even
add("1+1=",2) /* ="1+1=2" */

Because of this the JS engine cannot generate efficient code for the function "ahead of time" -- before it sees how it's used: it would need to account for any possible types of parameters passed into it.

The way JS engines deal with this problem is:

  • start slow - by generating bytecode and running it through an interpreter, while observing the types the variables actually have
  • as some parts of the code get "hotter" (i.e. run repeatedly) they are recompiled ("just-in-time") more aggressively using the types inferred while at step 1
  • since the inferred types are not guaranteed to stay the same (e.g. you can call add(int, int) 100000 times and then call add(string, string), the optimized code has to have guards against this -- and fall back to the slower mode if this happens.

This is a rather smart technique and it often works well for hand-written JavaScript, but it obviously has more overhead than compiling statically typed code (like C compilers do). If you have a C(++) codebase (which you run in the browser by compiling it to JavaScript using EmScripten) - say, Unreal Engine, a chess engine, a DOS emulator, etc. - you can do better than this.

This is why Mozilla came up with asm.js: a C-to-JS compiler can embed type annotations into the generated JS and allow the JS engine to compile it efficiently right from the start.

The idea behind the a = a | 0 syntax was that they didn't want to invent a new syntax, as that would prevent the code using the new syntax from running in browsers that do not support it. So they use this construct, which already coerces any input value to an integer in any JS engine following the spec, to let the engines that support asm.js know the type of the variable.

If that is the way to tell JS that x and y are numbers, how can we tell JS that variables are strings, bools etc... using this bitwise operators

asm.js only supports numeric types, because a string in C is simply a pointer (i.e. an integer index) to a chunk of memory holding the individual bytes of the string. EmScripten-compiled code doesn't use JavaScript Strings.

Upvotes: 1

Mark
Mark

Reputation: 92440

I think if you search around this site you will find a lot about casting numbers and the peculiar typing of javascript.

Personally, I don't like the formulation with the bitwise or. There are a few reasons, but mostly it's because it seems like an easy place to pick up hard-to-detect bugs. Given the function in the video posted, consider:

function add(x, y) {
  x = x | 0
  y = y | 0
  return x + y
}

console.log( add(1, Infinity) )
// 1? Shouldn't that be inifinity?

console.log( add(Infinity, Infinity) )
// 0? Shouldn't that be inifinity?

console.log( add(1, {n: 4}) )
// 1? NaN seems more reasonable

console.log( add(1, undefined) )
// 1? NaN seems more reasonable

Modern JS has other ways to coerce numbers with more reasonable behavior for example Number():

function add(x, y) {
      x = Number(x)
      y = Number(y)
      return x + y
    }
    
console.log( add(2, Infinity) )
// Infinity

console.log( add(1, {m: 2}) )
// NaN

console.log( add(1, "2") )
// 3 - still casts strings as expected

console.log( add(1, [2]) )
// 3 - and single element array

One thing worth mentioning about Number() vs parseInt():

console.log(parseInt("24px", 10))
// 24 - handy for browser work

console.log(Number("24px"))
// Nan - stricter

There are similar functions for other types (String(23), Boolean(1), etc.)

Upvotes: 0

Related Questions