Reputation: 197
A long time ago, I've used this simple x86 assembler trick to obtain 0 or 1 as a result of floating point number comparison:
fld [value1]
fcom [value2]
fnstsw ax
mov al, ah
and eax, 1
This trick allows to avoid branching if comparison result only affects selection of a value from a set of 2 values. It was fast in Pentium days, now it may not be so much faster, but who knows.
Now I mainly use C++ and compile using Intel C++ Compiler or GCC C++ Compiler.
Can someone please help rewrite this code into 2 built-in assembler flavors (Intel and GCC).
The required function prototype is: inline int compareDoublesIndexed( const double value1, const double value2 ) { ... }
Maybe using SSE2 operations could be even more efficient. Your perspective?
I've tried this:
__asm__(
"fcomq %2, %0\n"
"fnstsw %ax\n"
"fsubq %2, %0\n"
"andq $L80, %eax\n"
"shrq $5, %eax\n"
"fmulq (%3,%eax), %0\n"
: "=f" (penv)
: "0" (penv), "F" (env), "r" (c)
: "eax" );
But I get error in Intel C++ Compiler: Floating point output constraint must specify a single register.
Upvotes: 1
Views: 1608
Reputation: 11582
As you mentioned, things have changed since the Pentium days:
Therefore first check what the compiler generates, you might be pleasantly surprised. I tried g++ with -O3
on the following code
fcmp.cpp:
int compareDoublesIndexed( const double value1, const double value2 ) {
return value1 < value2 ? 1 : 0;
}
This is what the compiler generated
0000000000400690 <_Z21compareDoublesIndexeddd>:
400690: 31 c0 xor %eax,%eax
400692: 66 0f 2e c8 ucomisd %xmm0,%xmm1
400696: 0f 97 c0 seta %al
400699: c3 retq
This is what it means
xor %eax,%eax ; EAX = 0
ucomisd %xmm0,%xmm1 ; compare value2 (in %xmm1) with value1 (in %xmm0)
seta %al ; AL = value2 > value1 ? 1 : 0
So the compiler avoided the conditional branch by using the seta
instruction (set byte to '1' if result is above, to '0' otherwise).
Upvotes: 2