Reputation: 81
The cos() in math.h run faster than the x86 asm fcos.
The following code is compare between the x86 fcos and the cos() in math.h.
In this code, 1000000 times asm fcos cost 150ms; 1000000 times cos() call cost only 80ms.
How is the fcos implemented in x86? Why is the fcos much slower than cos()?
My enviroment is intel i7-6820HQ + win10 + visual studio 2017.
#include "string"
#include "iostream"
#include<time.h>
#include "math.h"
int main()
{
int i;
const int i_max = 1000000;
float c = 10000;
float *d = &c;
float start_value = 8.333333f;
float* pstart_value = &start_value;
clock_t a, b;
a = clock();
__asm {
mov edx, pstart_value;
fld [edx];
}
for (i = 0; i < i_max; i++) {
__asm {
fcos;
}
}
b = clock();
printf("asm time = %u", b - a);
a = clock();
double y;
for (i = 0; i < i_max; i++) {
start_value = cos(start_value);
}
b = clock();
printf("math time = %u", b - a);
return 0;
}
According to my personal understanding, a single asm instruction is usually faster than a function call. Why in this case the fcos so slow?
Update: I have run the same code on another laptop with i7-6700HQ. On this laptop the 1000000 times fcos cost only 51ms. Why there is such a big difference between the two cpus.
Upvotes: 3
Views: 1285
Reputation: 67546
I bet the answer is easy. You do not use the result of cos
and it is optimized out as in this example
Change the variables to volatile to force cos
call.
Another guess: Maybe your cos implementation uses lookup tables. Then it will be faster than the hardware implementation.
Upvotes: 1