Reputation: 4358
I'm running my Z80 emulator both in Chrome and in Node. I get about 10x the performance in Chrome that I do in Node. (100k Z80 instructions take 6 ms in Chrome and 60 ms in Node.) I've run the profiler:
% node --prof index.js
% node --prof-process isolate-0x108000000-25550-v8.log
and it says that 95% of the time is spent in C++:
[Summary]:
ticks total nonlib name
103 3.8% 3.8% JavaScript
2604 95.2% 95.8% C++
6 0.2% 0.2% GC
17 0.6% Shared libraries
12 0.4% Unaccounted
The C++ breakdown is:
[C++ entry points]:
ticks cpp total name
2127 98.3% 77.7% T __ZN2v88internal40Builtin_CallSitePrototypeGetPromiseIndexEiPmPNS0_7IsolateE
32 1.5% 1.2% T __ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE
I've tracked down CallSitePrototypeGetPromiseIndex
to this source file. I'm not using promises, async
, or await
in my code. My test is just a tight loop of 100k emulated Z80 instructions, no I/O or anything.
I've found others online using the --prof
flag and none are finding this in their results. Is it a side-effect of profiling? Am I triggering promises somehow inside the loop? Any reason Node should be this much slower than Chrome?
Details: Node v12.13.1, Chrome 79.0.3945.88.
Upvotes: 2
Views: 542
Reputation: 4358
Okay this surprisingly similar question had a great answer by Esailija pointing me to this line in the V8 source code. It limits optimizations of switch statements to those under a certain size. The first thing my emulator does is have a 256-entry switch dispatch for the opcode. In my test I'm only passing it 0 (NOP), so it was safe to comment out huge chunks of the cases. Turns out if I comment out 13 of the cases, the performance jumps up by a factor of 25! If I only comment out 12 of the cases, then I get the slow performance.
The link into the V8 source code above is pretty old (2013), so I tried to find the modern equivalent. I didn't find a hard limit, but found several heuristics that decide between table lookups and tree (binary search) lookups (ia32, x86). When I plug in my numbers I don't quite get a borderline case where I found it, so I'm not sure this is the actual cause or whether there's another optimization not being triggered elsewhere.
As to the difference with Chrome, there's probably some subtle difference in when and how they decide to optimize their switches.
I'm not sure what the best solution is here, but clearly I need to avoid large switch statements. I'll either have a sequence of smaller switch statements, or replace the whole thing with an array of functions.
Update: I used an array of functions and my entire program sped up by a factor of 25.
Upvotes: 4