Reputation: 11268
Just found the following line in some old src code:
int e = (int)fmod(matrix[i], n);
where matrix
is an array of int
, and n
is a size_t
I'm wondering why the use of fmod
rather than %
where we have integer arguments, i.e. why not:
int e = (matrix[i]) % n;
Could there possibly be a performance reason for choosing fmod
over %
or is it just a strange bit of code?
Upvotes: 9
Views: 2912
Reputation: 643
fmod() is usually slower than an integer modulo operation but it can handle larger values. The phenomenal thing about fmod() is that it's results are always without precision loss since all intermediate results have the same or less bits than the divisor.
This is my fmod()-implementation with glibc- and MSVC-compatible NaN- and exception handling in C++20:
double myFmod( double x, double y )
{
constexpr uint64_t
SIGN = 1ull << 63,
IMPLICIT = 1ull << 52,
MANT = IMPLICIT - 1,
QBIT = 1ull << 51;
uint64_t const
binX = bit_cast<uint64_t>( x ),
binY = bit_cast<uint64_t>( y );
static auto abs = []( uint64_t m ) { return m & ~SIGN; };
auto isNaN = []( uint64_t m ) { return abs( m ) >= 0x7FF0000000000001u; };
auto isSig = []( uint64_t m ) { return !(m & QBIT); };
if( isNaN( binX ) ) [[unlikely]] // x == NaN
#if defined(_MSC_VER)
return bit_cast<double>( isNaN( binY ) ? binY | binX & binY & QBIT : binX );
#else
{
if( isSig( binX ) || isNaN( binY ) && isSig( binY ) ) [[unlikely]]
feraiseexcept( FE_INVALID );
return bit_cast<double>( binX | QBIT );
}
#endif
if( isNaN( binY ) ) [[unlikely]] // x != NaN || y == NaN
#if defined(_MSC_VER)
return y;
#else
{
if( isSig( binY ) ) [[unlikely]]
feraiseexcept( FE_INVALID );
return bit_cast<double>( binY | QBIT );
}
#endif
auto isInf = []( uint64_t m ) { return abs( m ) == 0x7FF0000000000000u; };
if( isInf( binX ) ) // x == Inf
{
feraiseexcept( FE_INVALID );
#if defined(_MSC_VER)
return bit_cast<double>( binX & ~MANT | QBIT );
#else
return -numeric_limits<double>::quiet_NaN();
#endif
}
if( !abs( binY ) ) [[unlikely]] // y == 0
{
feraiseexcept( FE_INVALID );
#if defined(_MSC_VER)
return numeric_limits<double>::quiet_NaN();
#else
return -numeric_limits<double>::quiet_NaN();
#endif
}
if( !abs( binX ) || isInf( binY ) ) [[unlikely]] // x == 0 || y == Inf
return x;
auto exp = []( uint64_t b ) -> int { return b >> 52 & 0x7FF; };
int
expX = exp( binX ),
expY = exp( binY );
auto mant = []( uint64_t b ) { return b & MANT; };
uint64_t
mantX = mant( binX ),
mantY = mant( binY );
static auto normalize = []( int &exp, uint64_t &mant )
{
unsigned shift = countl_zero( mant ) - 11;
mant <<= shift;
exp -= shift;
};
auto build = []( int &exp, uint64_t &mant )
{
if( exp ) [[likely]]
mant |= IMPLICIT;
else
{
exp = 1;
normalize( exp, mant );
}
};
build( expX, mantX );
build( expY, mantY );
uint64_t signX = binX & SIGN;
int expDiff;
while( (expDiff = expX - expY) > 0 )
{
unsigned bits = expDiff <= 11 ? expDiff : 11;
if( !(mantX = (mantX << bits) % mantY) ) [[unlikely]]
return bit_cast<double>( signX );
expX -= bits;
normalize( expX, mantX );
}
if( !expDiff && mantX >= mantY ) [[unlikely]]
if( (mantX -= mantY) ) [[likely]]
normalize( expX, mantX );
else
return bit_cast<double>( signX );
if( expX <= 0 ) [[unlikely]]
{
assert(expX >= -51);
mantX = mantX >> (unsigned)(-expX + 1);
expX = 0;
}
return bit_cast<double>( signX | (uint64_t)expX << 52 | mantX & MANT );
}
Upvotes: 0
Reputation: 145277
fmod
might be a tiny bit faster than the integer division on selected architectures.
Note however that if n
has a known non zero value at compile time, matrix[i] % n
would be compiled as a multiplication with a small adjustment, which should be much faster than both the integer modulus and the floating point modulus.
Another interesting difference is the behavior on n == 0
and INT_MIN % -1
. The integer modulus operation invokes undefined behavior on overflow which results in abnormal program termination on many current architectures. Conversely, the floating point modulus does not have these corner cases, the result is +Infinity
, -Infinity
, Nan
depending on the value of matrix[i]
and -INT_MIN
, all exceeding the range of int
and the conversion back to int
is implementation defined, but does not usually cause abnormal program termination. This might be the reason for the original programmer to have chosen this surprising solution.
Upvotes: 3
Reputation: 37954
Could there possibly be a performance reason for choosing
fmod
over%
or is it just a strange bit of code?
The fmod
might be a bit faster on architectures with high-latency IDIV
instruction, that takes (say) ~50 cycles or more, so fmod
's function call and int <---> double
conversions cost can be amortized.
According to Agner's Fog instruction tables, IDIV
on AMD K10 architecture takes 24-55 cycles. Comparing with modern Intel Haswell, its latency range is listed as 22-29 cycles, however if there are no dependency chains, the reciprocal throughput is much better on Intel, 8-11 clock cycles.
Upvotes: 3
Reputation: 57105
Experimentally (and quite counter-intuitively), fmod
is faster than %
- at least on AMD Phenom(tm) II X4 955 with 6400 bogomips. Here are two programs that use either of the techniques, both compiled with the same compiler (GCC) and the same options (cc -O3 foo.c -lm
), and ran on the same hardware:
#include <math.h>
#include <stdio.h>
int main()
{
int volatile a=10,b=12;
int i, sum = 0;
for (i = 0; i < 1000000000; i++)
sum += a % b;
printf("%d\n", sum);
return 0;
}
Running time: 9.07 sec.
#include <math.h>
#include <stdio.h>
int main()
{
int volatile a=10,b=12;
int i, sum = 0;
for (i = 0; i < 1000000000; i++)
sum += (int)fmod(a, b);
printf("%d\n", sum);
return 0;
}
Running time: 8.04 sec.
Upvotes: 1