Reputation: 5188
I'm getting consistently that using long double
datatype is about twice as fast as using double
for my calculations when using -funsafe-math-optimizations
. I would like to have an insight on this, because the 80 bit format is deprecated since long, or I might be doing something really dumb with the double
datatype. The compiler is g++ 4.8.2, target is x86_64 (so gcc would prefer SSE2 if I don't use long double
).
My code is more or less like this (pseudocode):
//x is an array of floating point numbers
for i -> x.size
accumulator = 0
for k -> kmax
accumulator += A[k]*(B[k]*cos(C*k*x[i]) - D[k]*sin(C*k*x[i]));
x[i] += F*accumulator;
if(x[i] >= 1/2) x[i] -= integer(x[i]+1/2);
else if(x[i] < -1/2) x[i] -= integer(x[i]-1/2);
A
, B
, .. are some precomputed arrays/constants.
The speedup seems unrelated to cacheline problems, because I get the same relative speedup if I parallelize the outer for loop with OpenMP.
EDIT:
I corrected the pseudocode: notice the cos
and sin
have the same argument, that is in the end the reason for the speedup (see gsg's answer and the comments).
Upvotes: 4
Views: 458
Reputation: 9377
My guess is that the difference is due to cos
.
The long double
math must be compiled into x87 instructions, making it easy and efficient to use the x87 operation fcos
. However there are no transcendental operations for the xmm
registers, so a call to cos must either generate code to move a double
onto the x87 stack and invoke fcos
, or make a function call to do the equivalent work. These are, presumably, more expensive for this compiler and machine.
You could try to verify this by looking at the assembly - look for call cos
or x87 instructions - and it might also be worth compiling with -mfpmath=387
to see if the performance characteristics change.
Upvotes: 2