aleksv
aleksv

Reputation: 197

Floating point number comparison trick: inline assembly

A long time ago, I've used this simple x86 assembler trick to obtain 0 or 1 as a result of floating point number comparison:

fld [value1]
fcom [value2]
fnstsw ax
mov al, ah
and eax, 1

This trick allows to avoid branching if comparison result only affects selection of a value from a set of 2 values. It was fast in Pentium days, now it may not be so much faster, but who knows.

Now I mainly use C++ and compile using Intel C++ Compiler or GCC C++ Compiler.

Can someone please help rewrite this code into 2 built-in assembler flavors (Intel and GCC).

The required function prototype is: inline int compareDoublesIndexed( const double value1, const double value2 ) { ... }

Maybe using SSE2 operations could be even more efficient. Your perspective?


I've tried this:

__asm__(
    "fcomq %2, %0\n"
    "fnstsw %ax\n"
    "fsubq %2, %0\n"
    "andq $L80, %eax\n"
    "shrq $5, %eax\n"
    "fmulq (%3,%eax), %0\n"
    : "=f" (penv)
    : "0" (penv), "F" (env), "r" (c)
    : "eax" );

But I get error in Intel C++ Compiler: Floating point output constraint must specify a single register.

Upvotes: 1

Views: 1608

Answers (1)

amdn
amdn

Reputation: 11582

As you mentioned, things have changed since the Pentium days:

  • SSE is now the preferred instruction set for floating point instead of x87, even for scalar operations
  • optimizing compilers are now very good

Therefore first check what the compiler generates, you might be pleasantly surprised. I tried g++ with -O3 on the following code

fcmp.cpp:

int compareDoublesIndexed( const double value1, const double value2 ) {
    return value1 < value2 ? 1 : 0;
}

This is what the compiler generated

0000000000400690 <_Z21compareDoublesIndexeddd>:
  400690:       31 c0                   xor    %eax,%eax
  400692:       66 0f 2e c8             ucomisd %xmm0,%xmm1
  400696:       0f 97 c0                seta   %al
  400699:       c3                      retq   

This is what it means

  xor     %eax,%eax        ; EAX = 0
  ucomisd %xmm0,%xmm1      ; compare value2 (in %xmm1) with value1 (in %xmm0)
  seta    %al              ; AL = value2 > value1 ? 1 : 0

So the compiler avoided the conditional branch by using the seta instruction (set byte to '1' if result is above, to '0' otherwise).

Upvotes: 2

Related Questions