Reputation: 363

Why '===' is slower than comparing char by char when comparing two string in Nodejs

I found that in Nodejs comparing two strings by comparing every char of them is faster than using the statement 'str1 === str2'. What is the reason for this? And in browsers, it's just opposite.

Here is the code that I had tried, the two long strings are equal.Node version is v8.11.3

function createConstantStr(len) {
  let str = "";
  for (let i = 0; i < len; i++) {
    str += String.fromCharCode((i % 54) + 68);
  }

  return str;
}

let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);

console.time('equal')
console.log(str === str2);
console.timeEnd('equal')

console.time('equal by char')
let flag = true;
for (let i = 0; i < str.length; i++) {
  if (str[i] !== str2[i]) {
    flag = false;
    break;
  }
}

console.log(flag);
console.timeEnd('equal by char');

Upvotes: 14

Answers (3)

hanshenrik

Reputation: 21463

(credits to TheWild at irc://irc.freenode.net/##Javascript for figuring it out)

at least in Firefox and Chrome and Node, str and str2 is lazy-initialized, the actual createConstantStr() call is ran when the result is actually needed, not when you tell js to create it. if you change it to

let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);
console.log(str[10],str2[20]);

then the strings will be created in the console.log() call, and you'll get more sane results in your benchmarks, and === will indeed be faster. (much faster on my laptop, from 5ms by-char to <1 ms with ===)

original message:

i don't have an answer but i just want to add that i can reproduce it in firefox 60.6.3esr (64-bit), === is about 28-31 milliseconds and by char is about 3-6 milliseconds:

function test(){
let createConstantStr=function(len) {
  let str = "";
  for (let i = 0; i < len; i++) {
    str += String.fromCharCode((i % 54) + 68);
  }

  return str;
};

let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);

console.time('equal')
console.log(str === str2);
console.timeEnd('equal')

console.time('equal by char')
let flag = true;
for (let i = 0; i < str.length; i++) {
  if (str[i] !== str2[i]) {
    flag = false;
    break;
  }
}

console.log(flag);
console.timeEnd('equal by char');
};

firefox results:

test();
equal: 28ms
equal by char: 6ms
undefined
test();
equal: 29ms
equal by char: 4ms
undefined
test();
equal: 29ms
equal by char: 3ms
undefined
test();
equal: 31ms
equal by char: 5ms
undefined
test();
equal: 28ms
equal by char: 4ms
undefined

and i can reproduce it in

Google Chrome   74.0.3729.131 (Official Build) (64-bit) (cohort: Stable)
Revision    518a41c1fa7ce1c8bb5e22346e82e42b4d76a96f-refs/branch-heads/3729@{#954}
JavaScript  V8 7.4.288.26

chrome results:

test();
true
equal: 23.493896484375ms
true
equal by char: 11.197021484375ms
undefined
test();
true
equal: 22.749755859375ms
true
equal by char: 11.500244140625ms
undefined
test();
true
equal: 24.43505859375ms
true
equal by char: 11.48291015625ms
undefined
test();
true
equal: 23.84521484375ms
true
equal by char: 11.38720703125ms
undefined
test();
true
equal: 21.8798828125ms
true
equal by char: 11.0390625ms
undefined
test();
true
equal: 23.989013671875ms
true
equal by char: 10.934814453125ms
undefined

laptop rolling Intel Core i7 6700 and 1200MHz ram, not running on battery.

Upvotes: 0

Louis

Reputation: 151380

It's been already pointed out to you that if you flip your two tests around then comparing with === will be faster than comparing char by char. The explanations you've gotten so far as to why have not precisely circumscribed why that is. There are a few issues that affect your results.

The first `console.log` call is expensive

If I try this:

console.time("a");
console.log(1 + 2);
console.timeEnd("a");

console.time("b");
console.log("foo");
console.timeEnd("b");

I get something like:

3
a: 3.864ms
foo
b: 0.050ms

If I flip code around so that I have this:

console.time("b");
console.log("foo");
console.timeEnd("b");

console.time("a");
console.log(1 + 2);
console.timeEnd("a");

Then I get something like this:

foo
b: 3.538ms
3
a: 0.330ms

If I modify the code by adding a console.log before any timing is taken, like this:

console.log("start");

console.time("a");
console.log(1 + 2);
console.timeEnd("a");

console.time("b");
console.log("foo");
console.timeEnd("b");

Then I get something like:

start
3
a: 0.422ms
foo
b: 0.027ms

By putting a console.log before starting the timings, I'm excluding the initial cost of calling console.log from the timings.

The way you have set up your test, the first console.log call is done by which ever of the === or char-by-char test comes first and this the cost of the first console.log call is added to that test. Whichever test comes second does not bear that cost. Ultimately, for a test like this, I'd rather move console.log outside the region that is being timed. For instance the first timed region could be written like this:

console.time('equal');
const result1 = str === str2;
console.timeEnd('equal');
console.log(result1);

Storing the result in result1 and then using console.log(result1) outside the timed region ensures that you can see the result while at the same time not counting the cost incurred by console.log.

Whichever of your tests comes first bears the cost of flattening the string trees internally created by v8

Node uses the v8 JavaScript engine for running your JavaScript. v8 implements a string in multiple ways. objects.h shows in a comment the class hierarchy that v8 supports. Here is the section relevant to strings:

//       - String
//         - SeqString
//           - SeqOneByteString
//           - SeqTwoByteString
//         - SlicedString
//         - ConsString
//         - ThinString
//         - ExternalString
//           - ExternalOneByteString
//           - ExternalTwoByteString
//         - InternalizedString
//           - SeqInternalizedString
//             - SeqOneByteInternalizedString
//             - SeqTwoByteInternalizedString
//           - ConsInternalizedString
//           - ExternalInternalizedString
//             - ExternalOneByteInternalizedString
//             - ExternalTwoByteInternalizedString

There are two classes important for our discussion: SeqString and ConsString. They differ in how they store the string in memory. The SeqString class is a straightforward implementation: the string is simply an array of characters. (Actually SeqString itself is abstract. The real classes are SeqOneByteString and SeqTwoByteString but that's not important here.) ConsString however stores a string as a binary tree. A ConcString has a first field and a second field which are pointers to other strings.

Consider this code:

let str = "";
for (let i = 0; i < 10; ++i) {
  str += i;
}
console.log(str);

If v8 used SeqString to implement the code above, then:

At iteration 0, it would have to allocate a new string of size 1, copy to it the old value of str ("") and append to that "0" and set str to the new string ("0").
At iteration 1, it would have to allocate a new string of size 2, copy to it the old value of str ("0") and append to that "1") and set str to the new string ("01").
...
At iteration 9, it would have to allocate a new string of size 10, copy to it the old value of str ("012345678") and append to that "9" and set str to the new string ("0123456789").

The total number of characters copied for the 10 steps is 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55 characters. 55 characters moved around for a string that in the end contains 10 characters.

Instead v8 actually uses ConsString like this:

At iteration 0, allocate a new ConcString with first set to the old value of str, and second set to i (as a string) 0, and set str to this new ConcString that was just allocated.
At iteration 1, allocate a new ConcString with first set to the old value of str, and second set to "1", and set str to this new ConcString that was just allocated.
...
At iteration 9, allocate a new ConcString with first set to the old value of str, and second set to "9".

If we represent each ConcString as (<first>, <second>) where <first> is the contents of its first field and <second> is the content of the second field, then the end result is this:

(((((((((("", "0"), "1"), "2"), "3"), "4"), "5"), "6"), "7"), "8"), "9")

By doing things this way v8 avoids having to copy strings over and over again from step to step. Each step is just one allocation and adjusting a couple of pointers. While storing strings as a tree helps make concatenations fast, it has a downside in that other operations become slower. v8 mitigates this by flattening ConsString trees. After flattening the example above, it becomes:

("0123456789", "")

Note that when a ConsString is flattened, this very ConsString object is mutated. (From the standpoint of the JS code, the string remains the same. Only its internal v8 representation has changed.) It is easier to compare flattened ConsString trees, and indeed this is exactly what v8 does (ref):

bool String::Equals(Isolate* isolate, Handle<String> one, Handle<String> two) {
  if (one.is_identical_to(two)) return true;
  if (one->IsInternalizedString() && two->IsInternalizedString()) {
    return false;
  }
  return SlowEquals(isolate, one, two);
}

The strings we're talking about are not internalized so SlowEquals is called (ref):

bool String::SlowEquals(Isolate* isolate, Handle<String> one,
                        Handle<String> two) {
[... some shortcuts are attempted ...]
  one = String::Flatten(isolate, one);
  two = String::Flatten(isolate, two);

I've shown here that comparing strings for equality flattens them internally, but calls to String::Flatten are found in many other places. Both of your tests end up flattening the strings, through different means.

For your code, the upshot is this:

Your createConstantStr creates strings which are internally stored as a ConsString. So str and str2 are ConsString objects, as far as v8 is concerned.
The first test you run causes str and str2 to be flattened so: a) this test has to carry the cost of flattening the strings, b) the second test benefits from working with ConcString objects that are already flattened. (Remember, when a ConcString object is flattened, this very object is mutated. So if it is accessed again later, it is already flattened.)

Upvotes: 27

1565986223

Reputation: 6718

I reversed the comparison operation and looks like 0 ms (sometimes 1 ms) on === (firefox). So probably something to do with compiler internals trying to optimize. Something like, hey the strings are the same in the second comparison operation and I've already compared them. So I'll re-use the result.

This youtube video explains best.

function createConstantStr(len) {
  let str = "";
  for (let i = 0; i < len; i++) {
    str += String.fromCharCode((i % 54) + 68);
  }

  return str;
}

let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);

console.time('equal by char')
let flag = true;
for (let i = 0; i < str.length; i++) {
  if (str[i] !== str2[i]) {
    flag = false;
    break;
  }
}

console.log(flag);
console.timeEnd('equal by char');

console.time('equal')
console.log(str === str2);
console.timeEnd('equal')

Upvotes: 4

Why &#39;===&#39; is slower than comparing char by char when comparing two string in Nodejs

Answers (3)

The first console.log call is expensive

Whichever of your tests comes first bears the cost of flattening the string trees internally created by v8

Related Questions

Why '===' is slower than comparing char by char when comparing two string in Nodejs

The first `console.log` call is expensive