Ben
Ben

Reputation: 11502

Performance gain from double counter in a js "for" loop?

While researching array assignment performance, I stumbled across this jsperf which purportedly shows a very dramatic speed increase for for loops of the form:

var i, j = 0;
for (i = 0; i < n; i++) {
  myArray[j++] = i;
}

... in current browsers. This "double increment" form is totally alien to me, and seems bizarre. I can't find any discussion on it.

Many people have warned over the years about misleading data from microbenchmarks due to the increasing cleverness of compiler optimizations. Is this one of those cases? Why or why not? If the jsperf is written incorrectly, I would love to see a correct, revised jsperf that more accurately reflects real world conditions.

Upvotes: 2

Views: 196

Answers (1)

user4842163
user4842163

Reputation:

There should be no way that this double counter loop vs. a single counter loop is faster unless JS is doing something extremely poor/bizarre with optimization. From a machine level standpoint, such code would normally use two general-purpose registers instead of one, having to increment both. The difference should be very small and generally insignificant, but the single counter should have a slight performance edge.

The only way to tell for sure is to see the resulting machine instructions/disassembly.

It's possible that using length isn't as simple as just accessing a variable. It's somewhat of a surprise to me if such is the case, but it may translate to more instructions (ex: involving branching in worst case scenarios). But in that case, you should still get slightly faster results just using i instead of j and still a single-counter loop.

One thing worth trying is to change the order of your tests around. It is possible that there's something going on with memory from the previous tests related to paging/caching that is allowing that latter, double-counter test to go faster. For example, that double counter test is immediately followed by a single counter test. Try swapping those two around in the order they're executed and see if it affects the results.

Update

As shown in Mark's test in the comments, push-numbers-redux, he does get faster results with the single counter loop when avoiding length. Apparently length does require more instructions than a simple variable, perhaps with some branching for associative array cases that the optimizer can't eliminate. But a single counter loop will still beat that double counter one, if we're testing regular variables against variables.

Microbenchmarks

Since this subject was also raised about why micro-benchmarks can be somewhat bad, they're not always bad but there are some caveats associated with them. I would say your test is pretty micro since it's not doing anything meaningful from a user point of view. It creates an array of data only to do nothing with it.

About why this can be bad if you're trying to generalize performance ideas from such tests, for a start, the hardware is a dynamic machine. It tries to predict what you are doing, what branches of code would be more commonly executed, transferring DRAM to cache, etc. The operating system is also dynamic, paging memory on the fly. Given that dynamic nature of the environment, you can be in danger of writing a micro-test which appears faster but is simply getting lucky with these dynamic factors. Real world tests that do more work, and perhaps more importantly, a variety of work, tend to mitigate that "dynamic luck" factor that can mislead you into believing something is generally faster when it might only be faster for your very specific test. You don't have to write full-blown large-scale applications to get to something real-world like. For example, computing a Mandelbrot set is rather simple code (can fit in one page) but is still doing enough to escape those micro-level dangers.

Another danger is that optimizing compiler. You can get bitten in micro-benchmarks when the optimizer detects that you are basically causing no global side effects at all (ex: computing data only to discard it without printing it or doing anything with it to cause changes elsewhere). With very sophisticated optimizers, they can detect when certain computations can be skipped since they cause no side effects, and you might sometimes find that something you did caused things to run 10,000 times faster, but not because the actual work was being done faster, but because the optimizer figured it doesn't need to be done at all and skipped it outright. If, in your test, you place yourself in the shoes of the user and can rationalize why certain parts of your code can be skipped and still give the user the same output/result without waiting as long, then the optimizer can too and just skip that code. Whenever you're doing micro-benchmarks against a strong optimizer and find a result that seems too good to be true, it probably is too good to be true and the optimizer is just skipping the work outright since it noticed in your superficial test that it doesn't actually need to do it.

Last but not least, focusing on micro-efficiencies will often pull you into that assembly-like thinking. There's no wiggle room to optimize in smarter ways when you're just repeatedly testing a few instructions in a tight loop. Generally you want more wiggle room when staring at performance measurements, to work your way from coarse optimizations (algorithmic, multithreading, etc) down to the smallest micro-optimizations. When your tests are micro, you end up getting to the most granular type of metal-scraping thinking right away, and that can lead to an unhealthy obsession over saving a few clock cycles when the real world might present you an opportunity to save billions instead with the same effort/time invested.

Upvotes: 4

Related Questions