marczellm
marczellm

Reputation: 1336

Speed ratio of algorithm versus precompiled reference implementation differs across computers

We have a small C++ project with the following architecture.

These two were compiled into a DLL:

Then another implementation of the same algorithm is written by someone else.

The main() function does this:

We found that running the very same code and DLL on different computers returned quite different speed ratios. On one computer an implementation scored 6.4, and the very same implementation scored 2.8 on another machine. How could that be?

Upvotes: 1

Views: 100

Answers (1)

Cornstalks
Cornstalks

Reputation: 38218

There could be tons of factors, but here are a few:

  • CPU cache can be a big one. Different processors have different caches (and not just in terms of raw cache size, but also caching strategies). One might be "smarter" than the other, or perhaps one just happens to work better than another in this specific situation.
  • CPU pipelining. Instructions these days are interleaved in the CPU, even in a single thread of execution. The way the CPU pipeline works varies from CPU to CPU, and one CPU might be able to two particular things at once, while another CPU can't. If one of the implementations exploit this, then it gets a speed boost (or if they both do, then they both get closer to the same speed).
  • CPU instruction execution times may vary. So one CPU executing the exact same instructions as another CPU might be able to do each one faster than the other CPU. If one computer's CPU takes a longer time to use a particular instruction (and one of the implementations happens to use that instruction), while another CPU has been improved to speed up that instruction's execution time, then there will be a larger time discrepancy.
  • Branch prediction models in the CPUs might be different, and one implementation might be more or less friendly to a particular CPU's branch prediction model.
  • Operating systems can affect this in many ways, from memory allocation strategies (maybe one OS has a memory allocation strategy that causes a bigger discrepancy in times, while another OS has a different allocation strategy that minimizes the discrepancy), to CPU time slice management (are the algorithms multithreaded, for example?).

Upvotes: 1

Related Questions