Fedor
Fedor

Reputation: 21099

Efficient check that two floating point values have distinct signs

I need to find whether two finite floating point values A and B have different signs or one of them is zero.

In many code examples I see the test as follows:

if ( (A <= 0 && B >= 0) || (A >= 0 && B <= 0) )

It works fine but looks inefficient to me since many conditions are verified here and each condition branch is a slow operation for modern CPUs.

The other option is to use multiplication of the floating-point values:

if ( A * B <= 0 )

It shall work even in case of overflow, since infinity values preserve proper sign.

What is the most efficient way for performing the check (one of these two or probably some other)?

Upvotes: 0

Views: 661

Answers (2)

chux
chux

Reputation: 153348

Efficient check that two floating point values have distinct signs

if ( A * B <= 0 ) fails to distinguish the sign when A or B are of the set -0.0, +0.0.

Consider signbit()

if (signbit(A) == signbit(B)) {
  ; // same sign
} else  {
  ; // distinct signs
}

Trust the compiler to form efficient code - or use a better compiler.


OP then formed a different goal: "ether two finite floating point values A and B have distinct signs or one of them is zero."

Of course if (signbit(A) != signbit(B) || A == 0.0 || B == 0.0) meets this newer functionality without the rounding to 0.0 issue @Eric Postpischil of if ( A * B <= 0 ), but may not meet the fuzzy requirement of most efficient way.

The best way to form efficient code and avoid premature optimization really the root of all evil is to see the larger context.

Upvotes: 3

phuclv
phuclv

Reputation: 41764

It works fine but looks inefficient to me since many conditions are verified here and each condition branch is a slow operation for modern CPUs.

An optimizing compiler won't branch unnecessarily. Don't worry about premature optimization unless you're in a very tight hot loop. Your first snippet is compiled to only 2 branches in x86-64 and ARM64 in the compilers I tried. It can also be compiled to a branchless version. Of course it may still be slower than a simple A * B <= 0 but there's no way to know that for sure without a proper benchmark

If you don't care about NaN then you can simply do some bitwise operations:

auto a = std::bit_cast<uint64_t>(A);
auto b = std::bit_cast<uint64_t>(B);
const auto sign_mask = 1ULL << 63;
return ((a ^ b) & sign_mask) || (((a | b) & ~sign_mask) == 0);

If A and B have different signs then it'll be matched by (a ^ b) & sign_mask. If they have same sign then they both must be zero which will be caught by the latter condition. But this works in the integer so it may incur a cross-domain penalty when moving the value from float to int domain

If std::bit_cast not available then just replace with memcpy(&a, &A, sizeof A)

Demo on Godbolt

Again, do a benchmark to determine what's best for your target. There's no solution that's fastest on every microarchitecture available. If you really run this a lot of times in a loop then you should use SIMD instead to check for multiple values at the same time. You should also use profile-guided optimization in order for the compiler to know where and when to branch

Upvotes: 1

Related Questions