Reputation: 363
I found that in Nodejs comparing two strings by comparing every char of them is faster than using the statement 'str1 === str2'. What is the reason for this? And in browsers, it's just opposite.
Here is the code that I had tried, the two long strings are equal.Node version is v8.11.3
function createConstantStr(len) {
let str = "";
for (let i = 0; i < len; i++) {
str += String.fromCharCode((i % 54) + 68);
}
return str;
}
let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);
console.time('equal')
console.log(str === str2);
console.timeEnd('equal')
console.time('equal by char')
let flag = true;
for (let i = 0; i < str.length; i++) {
if (str[i] !== str2[i]) {
flag = false;
break;
}
}
console.log(flag);
console.timeEnd('equal by char');
Upvotes: 14
Views: 841
Reputation: 21463
(credits to TheWild at irc://irc.freenode.net/##Javascript for figuring it out)
at least in Firefox and Chrome and Node, str and str2 is lazy-initialized, the actual createConstantStr() call is ran when the result is actually needed, not when you tell js to create it. if you change it to
let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);
console.log(str[10],str2[20]);
then the strings will be created in the console.log() call, and you'll get more sane results in your benchmarks, and ===
will indeed be faster. (much faster on my laptop, from 5ms by-char to <1 ms with ===)
original message:
i don't have an answer but i just want to add that i can reproduce it in firefox 60.6.3esr (64-bit), === is about 28-31 milliseconds and by char is about 3-6 milliseconds:
function test(){
let createConstantStr=function(len) {
let str = "";
for (let i = 0; i < len; i++) {
str += String.fromCharCode((i % 54) + 68);
}
return str;
};
let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);
console.time('equal')
console.log(str === str2);
console.timeEnd('equal')
console.time('equal by char')
let flag = true;
for (let i = 0; i < str.length; i++) {
if (str[i] !== str2[i]) {
flag = false;
break;
}
}
console.log(flag);
console.timeEnd('equal by char');
};
firefox results:
test();
equal: 28ms
equal by char: 6ms
undefined
test();
equal: 29ms
equal by char: 4ms
undefined
test();
equal: 29ms
equal by char: 3ms
undefined
test();
equal: 31ms
equal by char: 5ms
undefined
test();
equal: 28ms
equal by char: 4ms
undefined
and i can reproduce it in
Google Chrome 74.0.3729.131 (Official Build) (64-bit) (cohort: Stable)
Revision 518a41c1fa7ce1c8bb5e22346e82e42b4d76a96f-refs/branch-heads/3729@{#954}
JavaScript V8 7.4.288.26
chrome results:
test();
true
equal: 23.493896484375ms
true
equal by char: 11.197021484375ms
undefined
test();
true
equal: 22.749755859375ms
true
equal by char: 11.500244140625ms
undefined
test();
true
equal: 24.43505859375ms
true
equal by char: 11.48291015625ms
undefined
test();
true
equal: 23.84521484375ms
true
equal by char: 11.38720703125ms
undefined
test();
true
equal: 21.8798828125ms
true
equal by char: 11.0390625ms
undefined
test();
true
equal: 23.989013671875ms
true
equal by char: 10.934814453125ms
undefined
Upvotes: 0
Reputation: 151380
It's been already pointed out to you that if you flip your two tests around then comparing with ===
will be faster than comparing char by char. The explanations you've gotten so far as to why have not precisely circumscribed why that is. There are a few issues that affect your results.
console.log
call is expensiveIf I try this:
console.time("a");
console.log(1 + 2);
console.timeEnd("a");
console.time("b");
console.log("foo");
console.timeEnd("b");
I get something like:
3
a: 3.864ms
foo
b: 0.050ms
If I flip code around so that I have this:
console.time("b");
console.log("foo");
console.timeEnd("b");
console.time("a");
console.log(1 + 2);
console.timeEnd("a");
Then I get something like this:
foo
b: 3.538ms
3
a: 0.330ms
If I modify the code by adding a console.log
before any timing is taken, like this:
console.log("start");
console.time("a");
console.log(1 + 2);
console.timeEnd("a");
console.time("b");
console.log("foo");
console.timeEnd("b");
Then I get something like:
start
3
a: 0.422ms
foo
b: 0.027ms
By putting a console.log
before starting the timings, I'm excluding the initial cost of calling console.log
from the timings.
The way you have set up your test, the first console.log
call is done by which ever of the ===
or char-by-char test comes first and this the cost of the first console.log
call is added to that test. Whichever test comes second does not bear that cost. Ultimately, for a test like this, I'd rather move console.log
outside the region that is being timed. For instance the first timed region could be written like this:
console.time('equal');
const result1 = str === str2;
console.timeEnd('equal');
console.log(result1);
Storing the result in result1
and then using console.log(result1)
outside the timed region ensures that you can see the result while at the same time not counting the cost incurred by console.log
.
Node uses the v8 JavaScript engine for running your JavaScript. v8 implements a string in multiple ways. objects.h
shows in a comment the class hierarchy that v8 supports. Here is the section relevant to strings:
// - String
// - SeqString
// - SeqOneByteString
// - SeqTwoByteString
// - SlicedString
// - ConsString
// - ThinString
// - ExternalString
// - ExternalOneByteString
// - ExternalTwoByteString
// - InternalizedString
// - SeqInternalizedString
// - SeqOneByteInternalizedString
// - SeqTwoByteInternalizedString
// - ConsInternalizedString
// - ExternalInternalizedString
// - ExternalOneByteInternalizedString
// - ExternalTwoByteInternalizedString
There are two classes important for our discussion: SeqString
and ConsString
. They differ in how they store the string in memory. The SeqString
class is a straightforward implementation: the string is simply an array of characters. (Actually SeqString
itself is abstract. The real classes are SeqOneByteString
and SeqTwoByteString
but that's not important here.) ConsString
however stores a string as a binary tree. A ConcString
has a first
field and a second
field which are pointers to other strings.
Consider this code:
let str = "";
for (let i = 0; i < 10; ++i) {
str += i;
}
console.log(str);
If v8 used SeqString
to implement the code above, then:
At iteration 0, it would have to allocate a new string of size 1, copy to it the old value of str
(""
) and append to that "0"
and set str
to the new string ("0"
).
At iteration 1, it would have to allocate a new string of size 2, copy to it the old value of str
("0"
) and append to that "1"
) and set str
to the new string ("01"
).
...
At iteration 9, it would have to allocate a new string of size 10, copy to it the old value of str
("012345678"
) and append to that "9"
and set str
to the new string ("0123456789"
).
The total number of characters copied for the 10 steps is 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55 characters. 55 characters moved around for a string that in the end contains 10 characters.
Instead v8 actually uses ConsString
like this:
At iteration 0, allocate a new ConcString
with first
set to the old value of str
, and second
set to i
(as a string) 0
, and set str
to this new ConcString
that was just allocated.
At iteration 1, allocate a new ConcString
with first
set to the old value of str
, and second
set to "1"
, and set str
to this new ConcString
that was just allocated.
...
At iteration 9, allocate a new ConcString
with first
set to the old value of str
, and second
set to "9"
.
If we represent each ConcString
as (<first>, <second>)
where <first>
is the contents of its first
field and <second>
is the content of the second
field, then the end result is this:
(((((((((("", "0"), "1"), "2"), "3"), "4"), "5"), "6"), "7"), "8"), "9")
By doing things this way v8 avoids having to copy strings over and over again from step to step. Each step is just one allocation and adjusting a couple of pointers. While storing strings as a tree helps make concatenations fast, it has a downside in that other operations become slower. v8 mitigates this by flattening ConsString
trees. After flattening the example above, it becomes:
("0123456789", "")
Note that when a ConsString
is flattened, this very ConsString
object is mutated. (From the standpoint of the JS code, the string remains the same. Only its internal v8 representation has changed.)
It is easier to compare flattened ConsString
trees, and indeed this is exactly what v8 does (ref):
bool String::Equals(Isolate* isolate, Handle<String> one, Handle<String> two) {
if (one.is_identical_to(two)) return true;
if (one->IsInternalizedString() && two->IsInternalizedString()) {
return false;
}
return SlowEquals(isolate, one, two);
}
The strings we're talking about are not internalized so SlowEquals
is called (ref):
bool String::SlowEquals(Isolate* isolate, Handle<String> one,
Handle<String> two) {
[... some shortcuts are attempted ...]
one = String::Flatten(isolate, one);
two = String::Flatten(isolate, two);
I've shown here that comparing strings for equality flattens them internally, but calls to String::Flatten
are found in many other places. Both of your tests end up flattening the strings, through different means.
For your code, the upshot is this:
Your createConstantStr
creates strings which are internally stored as a ConsString
. So str
and str2
are ConsString
objects, as far as v8 is concerned.
The first test you run causes str
and str2
to be flattened so: a) this test has to carry the cost of flattening the strings, b) the second test benefits from working with ConcString
objects that are already flattened. (Remember, when a ConcString
object is flattened, this very object is mutated. So if it is accessed again later, it is already flattened.)
Upvotes: 27
Reputation: 6718
I reversed the comparison operation and looks like 0 ms
(sometimes 1 ms) on ===
(firefox). So probably something to do with compiler internals trying to optimize. Something like, hey the strings
are the same in the second comparison operation and I've already compared them. So I'll re-use the result.
This youtube video explains best.
function createConstantStr(len) {
let str = "";
for (let i = 0; i < len; i++) {
str += String.fromCharCode((i % 54) + 68);
}
return str;
}
let str = createConstantStr(1000000);
let str2 = createConstantStr(1000000);
console.time('equal by char')
let flag = true;
for (let i = 0; i < str.length; i++) {
if (str[i] !== str2[i]) {
flag = false;
break;
}
}
console.log(flag);
console.timeEnd('equal by char');
console.time('equal')
console.log(str === str2);
console.timeEnd('equal')
Upvotes: 4