Reputation: 99
Is the comparison of 32 bits faster than the comparison of 64 bits?
I was looking at this file http://www.netlib.org/fdlibm/s_cos.c
They have this piece of code
/* |x| ~< pi/4 */
ix &= 0x7fffffff;
if(ix <= 0x3fe921fb) return __kernel_cos(x,z);
I understand the first line, which calculates the absolute value of x. But why is the comparison so complicated? Is there any improvement in performance by comparing the first 32 bits and not all 64 bits? Could I write
long unsigned int ix = *(long unsigned int *)(&x);
ix &= 0x7fffffffffffffff;
if (ix < 0x3fe921fb54442d18)
/* What comes next */ ;
and expect the same performance in terms of speed on a 64-bit machine? Though I agree this would consume more memory.
0x3fe921fb54442d18
is pi/2.
Upvotes: 5
Views: 749
Reputation: 4848
On my 64-bit Intel machine with Apple clang version 12.0.0
I tried running ix <= 0x3fe921fb
where ix
is int
typed over a billion times several times. I then tried running ix < 0x3fe921fb54442d18
with a unsigned long
typed ix
the same number of times. Here are the results in seconds:
No optimization 32-bit:
1.470922
1.448247
1.449718
1.446084
1.450020
1.453608
No optimization 64-bit:
1.567637
1.561653
1.565024
1.575094
1.567794
1.564141
-O3
32 bit:
0.421903
0.419469
0.425281
0.419894
0.425790
0.424800
-03
64-bit:
0.636965
0.640522
0.637279
0.634344
0.634989
0.633755
The 32-bit comparisons, at least on my machine, are consistently slightly faster.
I would go for the first option. Faster, even on 64-bit machines, regardless of optimization setting, and also uses less memory, and power, as the 32-bit comparison circuits are smaller and use less energy. Please also note that long unsigned int ix = *(long unsigned int *)(&x);
with a double
typed x
is technically undefined behavior.
Test Code:
volatile int ix = 0; // Changing this value has no effect
clock_t before = clock();
for (int i = 1<<30; i--;) {
volatile int sink = ix <= 0x3fe921fb;
}
printf("%f\n", (double)(clock()-before)/CLOCKS_PER_SEC);
volatile unsigned long ix = 0;
clock_t before = clock();
for (int i = 1<<30; i--;) {
volatile int sink = ix < 0x3fe921fb54442d18;
}
printf("%f\n", (double)(clock()-before)/CLOCKS_PER_SEC);
Upvotes: 0