bph
bph

Reputation: 11268

Is fmod faster than % for integer modulus calculation

Just found the following line in some old src code:

int e = (int)fmod(matrix[i], n);

where matrix is an array of int, and n is a size_t

I'm wondering why the use of fmod rather than % where we have integer arguments, i.e. why not:

int e = (matrix[i]) % n;

Could there possibly be a performance reason for choosing fmod over % or is it just a strange bit of code?

Upvotes: 9

Views: 2912

Answers (4)

Edison von Myosotis
Edison von Myosotis

Reputation: 643

fmod() is usually slower than an integer modulo operation but it can handle larger values. The phenomenal thing about fmod() is that it's results are always without precision loss since all intermediate results have the same or less bits than the divisor.
This is my fmod()-implementation with glibc- and MSVC-compatible NaN- and exception handling in C++20:

double myFmod( double x, double y )
{
    constexpr uint64_t
        SIGN = 1ull << 63,
        IMPLICIT = 1ull << 52,
        MANT = IMPLICIT - 1,
        QBIT = 1ull << 51;
    uint64_t const
        binX = bit_cast<uint64_t>( x ),
        binY = bit_cast<uint64_t>( y );
    static auto abs = []( uint64_t m ) { return m & ~SIGN; };
    auto isNaN = []( uint64_t m ) { return abs( m ) >= 0x7FF0000000000001u; };
    auto isSig = []( uint64_t m ) { return !(m & QBIT); };
    if( isNaN( binX ) ) [[unlikely]] // x == NaN
#if defined(_MSC_VER)
        return bit_cast<double>( isNaN( binY ) ? binY | binX & binY & QBIT : binX );
#else
    {
        if( isSig( binX ) || isNaN( binY ) && isSig( binY ) ) [[unlikely]]
            feraiseexcept( FE_INVALID );
        return bit_cast<double>( binX | QBIT );
    }
#endif
    if( isNaN( binY ) ) [[unlikely]] // x != NaN || y == NaN
#if defined(_MSC_VER)
        return y;
#else
    {
        if( isSig( binY ) ) [[unlikely]]
            feraiseexcept( FE_INVALID );
        return bit_cast<double>( binY | QBIT );
    }
#endif
    auto isInf = []( uint64_t m ) { return abs( m ) == 0x7FF0000000000000u; };
    if( isInf( binX ) ) // x == Inf
    {
        feraiseexcept( FE_INVALID );
#if defined(_MSC_VER)
        return bit_cast<double>( binX & ~MANT | QBIT );
#else
        return -numeric_limits<double>::quiet_NaN();
#endif
    }
    if( !abs( binY ) ) [[unlikely]] // y == 0
    {
        feraiseexcept( FE_INVALID );
#if defined(_MSC_VER)
        return numeric_limits<double>::quiet_NaN();
#else
        return -numeric_limits<double>::quiet_NaN();
#endif
    }
    if( !abs( binX ) || isInf( binY ) ) [[unlikely]] // x == 0 || y == Inf
        return x;
    auto exp = []( uint64_t b ) -> int { return b >> 52 & 0x7FF; };
    int
        expX = exp( binX ),
        expY = exp( binY );
    auto mant = []( uint64_t b ) { return b & MANT; };
    uint64_t
        mantX = mant( binX ),
        mantY = mant( binY );
    static auto normalize = []( int &exp, uint64_t &mant )
    {
        unsigned shift = countl_zero( mant ) - 11;
        mant <<= shift;
        exp -= shift;
    };
    auto build = []( int &exp, uint64_t &mant )
    {
        if( exp ) [[likely]]
            mant |= IMPLICIT;
        else
        {
            exp = 1;
            normalize( exp, mant );
        }
    };
    build( expX, mantX );
    build( expY, mantY );
    uint64_t signX = binX & SIGN;
    int expDiff;
    while( (expDiff = expX - expY) > 0 )
    {
        unsigned bits = expDiff <= 11 ? expDiff : 11;
        if( !(mantX = (mantX << bits) % mantY) ) [[unlikely]]
            return bit_cast<double>( signX );
        expX -= bits;
        normalize( expX, mantX );
    }
    if( !expDiff && mantX >= mantY ) [[unlikely]]
        if( (mantX -= mantY) ) [[likely]]
            normalize( expX, mantX );
        else
            return bit_cast<double>( signX );
    if( expX <= 0 ) [[unlikely]]
    {
        assert(expX >= -51);
        mantX = mantX >> (unsigned)(-expX + 1);
        expX = 0;
    }
    return bit_cast<double>( signX | (uint64_t)expX << 52 | mantX & MANT );
}

Upvotes: 0

chqrlie
chqrlie

Reputation: 145277

fmod might be a tiny bit faster than the integer division on selected architectures.

Note however that if n has a known non zero value at compile time, matrix[i] % n would be compiled as a multiplication with a small adjustment, which should be much faster than both the integer modulus and the floating point modulus.

Another interesting difference is the behavior on n == 0 and INT_MIN % -1. The integer modulus operation invokes undefined behavior on overflow which results in abnormal program termination on many current architectures. Conversely, the floating point modulus does not have these corner cases, the result is +Infinity, -Infinity, Nan depending on the value of matrix[i] and -INT_MIN, all exceeding the range of int and the conversion back to int is implementation defined, but does not usually cause abnormal program termination. This might be the reason for the original programmer to have chosen this surprising solution.

Upvotes: 3

Grzegorz Szpetkowski
Grzegorz Szpetkowski

Reputation: 37954

Could there possibly be a performance reason for choosing fmod over % or is it just a strange bit of code?

The fmod might be a bit faster on architectures with high-latency IDIV instruction, that takes (say) ~50 cycles or more, so fmod's function call and int <---> doubleconversions cost can be amortized.

According to Agner's Fog instruction tables, IDIV on AMD K10 architecture takes 24-55 cycles. Comparing with modern Intel Haswell, its latency range is listed as 22-29 cycles, however if there are no dependency chains, the reciprocal throughput is much better on Intel, 8-11 clock cycles.

Upvotes: 3

DYZ
DYZ

Reputation: 57105

Experimentally (and quite counter-intuitively), fmod is faster than % - at least on AMD Phenom(tm) II X4 955 with 6400 bogomips. Here are two programs that use either of the techniques, both compiled with the same compiler (GCC) and the same options (cc -O3 foo.c -lm), and ran on the same hardware:

#include <math.h>
#include <stdio.h>

int main()
{
    int volatile a=10,b=12;
    int i, sum = 0;
    for (i = 0; i < 1000000000; i++)
        sum += a % b;
    printf("%d\n", sum);
    return 0;
}

Running time: 9.07 sec.

#include <math.h>
#include <stdio.h>

int main()
{
    int volatile a=10,b=12;
    int i, sum = 0;
    for (i = 0; i < 1000000000; i++)
        sum += (int)fmod(a, b);
    printf("%d\n", sum);
    return 0;
}

Running time: 8.04 sec.

Upvotes: 1

Related Questions