Reputation: 8970
I am writing an algorithm to find the inverse of an nxn matrix. Let us take the specific case of a 3x3 matrix.
When you invert a matrix by hand, you typically look for rows/columns containing one or more zeros to make the determinant calculation faster as it eliminates terms you need to calculate.
Following this logic in C/C++, if you identify a row/column with one or more zeros, you will end up with the following code:
float term1 = currentElement * DetOf2x2(...);
// ^
// This is equal to 0.
//
// float term2 = ... and so on.
As the compiler cannot know currentElement
will be zero at compile time, it cannot be optimised to something like float term = 0;
and thus the floating point multiplication will be carried out at runtime.
My question is, will these zero values make the floating point multiplication faster, or will the multiplication take the same amount of time regardless of the value of currentElement
? If there is no way of optimising the multiplication at runtime, then I can remove the logic that searches for rows/columns containing zeros.
Upvotes: 3
Views: 2991
Reputation: 106096
float term1 = currentElement * DetOf2x2(...);
The compiler will call DetOf2x2(...)
even if currentElement is 0: that's sure to be far more costly than the final multiplication, whether by 0 or not. There are multiple reasons for that:
DetOf2x2(...)
may have side effects (like output to a log file) that need to happen even when currentElement
is 0
, andDetOf2x2(...)
may return values like the Not-a-Number / NaN sentinel that should propagate to term1
anyway (as noted first by Nils Pipenbrinck)Given DetOf2x2(...)
is almost certainly working on values that can only be determined at run-time, the latter possibility can't be ruled out at compile time.
If you want to avoid the call to Detof2x2(...)
, try:
float term1 = (currentElement != 0) ? currentElement * DetOf2x2(...) : 0;
Upvotes: 7
Reputation: 570
The following construct is valid at compile time when the compiler can guess the value of "currentElement".
float term1 = currentElement ? currentElement * DetOf2x2(...) : 0;
If it cannot be guessed at compile time, it will be checked at run-time and the performance depends on processor architecture : the trade-off between a branch (include branch latency and the delay to rebuild the instruction pipeline can be up to 10 or 20 cycles) and flat code (some processors run 3 instructions per cycle) and hardware branch prediction (when the hardware supports branch prediction).
Since multiplications throughput is close to 1 cycle on a x86_64 processor, there is no perf differenec depending on operand values like 0.0, 1.0, 2.0 or 12345678.99. if such a difference exists, that would be perceived as a covert channel in cryptographic-style software.
GCC allows to check function parameters at compile time
inline float myFn(float currentElement, myMatrix M)
{
#if __builtin_constant_p(currentElement) && currentElement == 0.0
return 0.0;
#else
return currentElement * det(M);
#endif
}
you need to enable inlining and interprocedural optimizations in the compiler.
Upvotes: 0
Reputation: 86353
The compiler is not allowed to optimize this unless the calculation is trival (e.g. all constants).
The reason is, that DetOf2x2 may return a NAN floating point value. Multiplying a NAN with zero does not return zero but a NAN again.
You can try it yourself using this little test here:
int main (int argc, char **args)
{
// generate a NAN
float a = sqrt (-1);
// Multiply NAN with zero..
float b = 0*a;
// this should *not* output zero
printf ("%f\n", b);
}
If you want to optimize your code, you have to test for zero on your own. The compiler will not do that for you.
Upvotes: 10
Reputation: 15642
Optimisations performed at runtime are known as JIT (just-in-time) optimisations. Optimisations performed at translation (compilation) are known as AOT (ahead-of-time) optimisations. You're referring to JIT optimisations. A compiler might introduce JIT optimisations into your machine code, but it's certainly a far more complex optimisation to implement than the common AOT optimisations. Optimisations are typically implemented based on significance, and this kind of "optimisation" might be seen to affect other algorithms negatively. C implementations aren't required to perform any of these optimisations.
You could provide the optimisation manually, which would be "the logic that searches for rows/columns containing zeros", or something like this: float term1 = currentElement != 0 ? currentElement * DetOf2x2(...) : 0;
Upvotes: 0
Reputation: 5765
Modern CPUs will actually handle a multiply-by-zero very quickly, more quickly than a general multiply, and much more quickly than a branch. Don't even bother trying to optimize this unless that zero is going to propagate through at least several dozen instructions.
Upvotes: 3