bph
bph

Reputation: 11258

Is fmod faster than % for integer modulus calculation

Just found the following line in some old src code:

int e = (int)fmod(matrix[i], n);

where matrix is an array of int, and n is a size_t

I'm wondering why the use of fmod rather than % where we have integer arguments, i.e. why not:

int e = (matrix[i]) % n;

Could there possibly be a performance reason for choosing fmod over % or is it just a strange bit of code?

Upvotes: 9

Views: 2868

Answers (3)

chqrlie
chqrlie

Reputation: 144520

fmod might be a tiny bit faster than the integer division on selected architectures.

Note however that if n has a known non zero value at compile time, matrix[i] % n would be compiled as a multiplication with a small adjustment, which should be much faster than both the integer modulus and the floating point modulus.

Another interesting difference is the behavior on n == 0 and INT_MIN % -1. The integer modulus operation invokes undefined behavior on overflow which results in abnormal program termination on many current architectures. Conversely, the floating point modulus does not have these corner cases, the result is +Infinity, -Infinity, Nan depending on the value of matrix[i] and -INT_MIN, all exceeding the range of int and the conversion back to int is implementation defined, but does not usually cause abnormal program termination. This might be the reason for the original programmer to have chosen this surprising solution.

Upvotes: 3

Grzegorz Szpetkowski
Grzegorz Szpetkowski

Reputation: 37904

Could there possibly be a performance reason for choosing fmod over % or is it just a strange bit of code?

The fmod might be a bit faster on architectures with high-latency IDIV instruction, that takes (say) ~50 cycles or more, so fmod's function call and int <---> doubleconversions cost can be amortized.

According to Agner's Fog instruction tables, IDIV on AMD K10 architecture takes 24-55 cycles. Comparing with modern Intel Haswell, its latency range is listed as 22-29 cycles, however if there are no dependency chains, the reciprocal throughput is much better on Intel, 8-11 clock cycles.

Upvotes: 3

DYZ
DYZ

Reputation: 57033

Experimentally (and quite counter-intuitively), fmod is faster than % - at least on AMD Phenom(tm) II X4 955 with 6400 bogomips. Here are two programs that use either of the techniques, both compiled with the same compiler (GCC) and the same options (cc -O3 foo.c -lm), and ran on the same hardware:

#include <math.h>
#include <stdio.h>

int main()
{
    int volatile a=10,b=12;
    int i, sum = 0;
    for (i = 0; i < 1000000000; i++)
        sum += a % b;
    printf("%d\n", sum);
    return 0;
}

Running time: 9.07 sec.

#include <math.h>
#include <stdio.h>

int main()
{
    int volatile a=10,b=12;
    int i, sum = 0;
    for (i = 0; i < 1000000000; i++)
        sum += (int)fmod(a, b);
    printf("%d\n", sum);
    return 0;
}

Running time: 8.04 sec.

Upvotes: 1

Related Questions