TBD
TBD

Reputation: 81

Why the cos function in math.h faster than x86 fcos instruction

The cos() in math.h run faster than the x86 asm fcos.

The following code is compare between the x86 fcos and the cos() in math.h.

In this code, 1000000 times asm fcos cost 150ms; 1000000 times cos() call cost only 80ms.

How is the fcos implemented in x86? Why is the fcos much slower than cos()?

My enviroment is intel i7-6820HQ + win10 + visual studio 2017.

#include "string"
#include "iostream"
#include<time.h>
#include "math.h"

int main()
{
  int i;
  const int i_max = 1000000;

  float c = 10000;
  float *d = &c;

  float start_value = 8.333333f;
  float* pstart_value = &start_value;
  clock_t a, b;
  a = clock();

  __asm {
    mov edx, pstart_value; 

    fld [edx];
  }

  for (i = 0; i < i_max; i++) {
    __asm {
        fcos;
    }
  }


  b = clock();
  printf("asm time = %u", b - a);

  a = clock();
  double y;
  for (i = 0; i < i_max; i++) {
    start_value = cos(start_value);
  }

  b = clock();
  printf("math time = %u", b - a);
  return 0;
}

According to my personal understanding, a single asm instruction is usually faster than a function call. Why in this case the fcos so slow?


Update: I have run the same code on another laptop with i7-6700HQ. On this laptop the 1000000 times fcos cost only 51ms. Why there is such a big difference between the two cpus.

Upvotes: 3

Views: 1285

Answers (1)

0___________
0___________

Reputation: 67546

I bet the answer is easy. You do not use the result of cos and it is optimized out as in this example

https://godbolt.org/z/iw-nft

Change the variables to volatile to force cos call.

https://godbolt.org/z/9_dpMs

Another guess: Maybe your cos implementation uses lookup tables. Then it will be faster than the hardware implementation.

Upvotes: 1

Related Questions