Reputation: 11258
Just found the following line in some old src code:
int e = (int)fmod(matrix[i], n);
where matrix
is an array of int
, and n
is a size_t
I'm wondering why the use of fmod
rather than %
where we have integer arguments, i.e. why not:
int e = (matrix[i]) % n;
Could there possibly be a performance reason for choosing fmod
over %
or is it just a strange bit of code?
Upvotes: 9
Views: 2868
Reputation: 144520
fmod
might be a tiny bit faster than the integer division on selected architectures.
Note however that if n
has a known non zero value at compile time, matrix[i] % n
would be compiled as a multiplication with a small adjustment, which should be much faster than both the integer modulus and the floating point modulus.
Another interesting difference is the behavior on n == 0
and INT_MIN % -1
. The integer modulus operation invokes undefined behavior on overflow which results in abnormal program termination on many current architectures. Conversely, the floating point modulus does not have these corner cases, the result is +Infinity
, -Infinity
, Nan
depending on the value of matrix[i]
and -INT_MIN
, all exceeding the range of int
and the conversion back to int
is implementation defined, but does not usually cause abnormal program termination. This might be the reason for the original programmer to have chosen this surprising solution.
Upvotes: 3
Reputation: 37904
Could there possibly be a performance reason for choosing
fmod
over%
or is it just a strange bit of code?
The fmod
might be a bit faster on architectures with high-latency IDIV
instruction, that takes (say) ~50 cycles or more, so fmod
's function call and int <---> double
conversions cost can be amortized.
According to Agner's Fog instruction tables, IDIV
on AMD K10 architecture takes 24-55 cycles. Comparing with modern Intel Haswell, its latency range is listed as 22-29 cycles, however if there are no dependency chains, the reciprocal throughput is much better on Intel, 8-11 clock cycles.
Upvotes: 3
Reputation: 57033
Experimentally (and quite counter-intuitively), fmod
is faster than %
- at least on AMD Phenom(tm) II X4 955 with 6400 bogomips. Here are two programs that use either of the techniques, both compiled with the same compiler (GCC) and the same options (cc -O3 foo.c -lm
), and ran on the same hardware:
#include <math.h>
#include <stdio.h>
int main()
{
int volatile a=10,b=12;
int i, sum = 0;
for (i = 0; i < 1000000000; i++)
sum += a % b;
printf("%d\n", sum);
return 0;
}
Running time: 9.07 sec.
#include <math.h>
#include <stdio.h>
int main()
{
int volatile a=10,b=12;
int i, sum = 0;
for (i = 0; i < 1000000000; i++)
sum += (int)fmod(a, b);
printf("%d\n", sum);
return 0;
}
Running time: 8.04 sec.
Upvotes: 1