Reputation: 289
I know that for unsigned integers, I can replace the modulo operation with a bitmask, if the divisor is a power of two. Do any numbers have a similar property for floats? That is, are there any numbers n
for which f mod n
can be calculated more efficiently than the in general case, not necessarily using a bitmask?
Other than, of course, one. Brain faliure
Edit: to clarify, f
is any floating point number (determined at runtime),
n
is any compile-time constant number in any format and I expect the result to be a float.
Upvotes: 6
Views: 1199
Reputation: 222234
The mathematics works the same for floating-point types as it does for integer types: If n is a power of the radix (two for binary), then f modulo n can be computed by zeroing digits representing values n or greater (also known as the high bits or high digits).
So, for a binary integer with bits b15 b14 b13 b12 b11 b10 b9 b8 b7 b6 b5 b4 b3 b2 b1 b0, we can compute the residue modulo four simply by setting b15 to b2 to zero, leaving only b1 b0.
Similarly, if the radix of the floating-point format is two, we can compute the residue modulo four by removing all digits whose value is four or greater. This does not require a division, but it does require examining the bits representing the value. A simple bit mask alone will not suffice.
The C standard characterizes a floating-point type as a sign (±1), a base b, an exponent, and some number of base b digits. Thus, if we know the format a particular C implementation uses to represent a floating-point type (the way that the sign, exponent, and digits are encoded into bits), an algorithm for calculating f modulo n, where n is a power of b, is:
Some notes:
Sample code:
// This code assumes double is IEEE 754 basic 64-bit binary floating-point.
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
// Return the bits representing the double x.
static uint64_t Bits(double x)
{ return (union { double d; uint64_t u; }) { x } .u; }
// Return the double represented by bits x.
static double Double(uint64_t x)
{ return (union { uint64_t u; double d; }) { x } .d; }
// Return x modulo 2**E.
static double Mod(double x, int E)
{
uint64_t b = Bits(x);
int e = b >> 52 & 0x7ff;
// If x is a NaN, return it.
if (x != x) return x;
// Is x is infinite, return a NaN.
if (!isfinite(x)) return NAN;
// If x is subnormal, adjust its exponent.
if (e == 0) e = 1;
// Remove the encoding bias from e and get the difference in exponents.
e = (e-1023) - E;
// Calculate number of bits to keep. (Could be consolidated above, kept for illustration.)
e = 52 - e;
if (e <= 0) return 0;
if (53 <= e) return x;
// Remove the low e bits (temporarily).
b = b >> e << e;
/* Convert b to a double and subtract the bits we left in it from the
original number, thus leaving the bits that were removed from b.
*/
return x - Double(b);
}
static void Try(double x, int E)
{
double expected = fmod(x, scalb(1, E));
double observed = Mod(x, E);
if (expected == observed)
printf("Mod(%a, %d) = %a.\n", x, E, observed);
else
{
printf("Error, Mod(%g, %d) = %g, but expected %g.\n",
x, E, observed, expected);
exit(EXIT_FAILURE);
}
}
int main(void)
{
double n = 4;
// Calculate the base-two logarithm of n.
int E;
frexp(n, &E);
E -= 1;
Try(7, E);
Try(0x1p53 + 2, E);
Try(0x1p53 + 6, E);
Try(3.75, E);
Try(-7, E);
Try(0x1p-1049, E);
}
Upvotes: 6
Reputation: 12635
Yes, when they are powers of two, there's also the possibility of doing modulo operations with a pseudo-mask, but in this case, we have to consider the fact that floating point numbers are formatted following IEEE-754 standard.
Let's assume we do the same operation, but this time, as numbers are real numbers, the power of two number will be a 1
bit followed by an infinite number of 0
s.
1000000000000000000000000... * 2^exp
to get the mask, we do the same thing as with integers... change the 1
with a 0
and all the bits following that digit change to 1
s.
0111111111111111111111111... * 2^exp =
1111111111111111111111111... * 2^(exp-1)
(THIS NUMBER IS (1.0 - FLT_EPSILON) * 2^(exp_of_module - 1))
but this is an all ones mask so the mantissa is never touched, except for the bits that are over the modulo number (which go to zero). When we zero all this bits, we don't need to do anything but shifting them to the left and let them drop into the waste basket, because they are masked out. So the masking of the mantissa is always a left shift (filling with more digits on the right from the number --- oops, we don't have them, so fill with zeros/random bits/ones...) and then normalize the number (this means to shift the number until the first significative one gets to the first place and the exponent bias is adjusted coordinately)
Let's see an example: We have 2.0
as modulus, and M_PI
(3.141592...
) as the number to mask on:
3.141592... = 40 49 0f db
= 0100 0000 0100 1001 0000 1111 1101 1011...
; which represents
= 0 (positive)
100 0000 0 (exponent biased 128 ==> 1)
(1.)100 1001 0000 1111 1101 1011...
^ Allways 1 so IEEE-754 doesn't include, we have to do, to operate.
11.00 1001 0000 1111 1101 1011...
01.11 1111 1111 1111 1111 1111...
MASKED == 01.00 1001 0000 1111 1101 1011...
/ // //// //// //// //// ////,-- introduced to complete format
RENORMALIZED == 1.001 0010 0001 1111 1011 011X...
; result = 0 (positive)
= 011 1111 1 (new exponent after norm. 127 ==> 0)
1.001 0010 0001 1111 1011 011X
= 0011 1111 1001 0010 0001 1111 1011 011X
which is 1.141592...
as expected.
As you see, we have to check the difference between the biased exponents, and left shift the mantissa as many places as that difference indicates. At the same time, we need to substract that difference to the biased exponent (and check for underflow or subnormal cases) and renormalize the number (left shift the mantissa, and decrement exponent, until the firs one in the mantissa goes to the hidden bit place)
I assumed the biased exponent of the modulo is lower than the one of the number to be operated on, as in the other case, the mask is all ones, and the number is not affected by the mask.
Upvotes: 0
Reputation: 7864
If n == 1.0
or n == -1.0
, then you can do:
r = f - trunc(f);
On x86_64, trunc
will typically use the ROUNDSD
instruction, so this will be pretty fast.
If n
is a power of 2 with magnitude greater than or equal to 1, and your platform has a fma
function that is native (for Intel, this means Haswell or newer), then you could do
r = fma(-trunc(f / n), n, f);
Any reasonable compiler should switch the division to a multiplication, and fold the negation into the appropriate FMA (or the constant), resulting in a multiplication, a truncation and an FMA.
This can also work for smaller powers of 2, as long as the result doesn't overflow (so the compiler wouldn't be free to substitute it).
Whether any compilers will actually do this is another matter. Floating point remainder functions aren't used much, and don't get much attention from compiler writers, e.g. https://bugs.llvm.org/show_bug.cgi?id=3359
Upvotes: 5
Reputation:
In general, no. However, given the "fuzzy" nature of floating point (at least IEEE 754), there are abuses tricks that can radically speed up certain calculations by approximations, at the cost of memory, precision, portability, maintainability, or a combination of these.
The simplest approach is to precalculate the operation and store the result into a lookup table before runtime. The larger you make the table, the more memory you use, but you also gain more precision.
A more unique approach dabbles with the floating point representation in memory. One of the more famous floating point hacks is the fast inverse square root. By these concepts, the general idea is that you can make an approximation function for just about anything you want, not just inverse square roots. If you know your input range and your tolerance for error, you could construct such an algorithm that is tuned precisely for your purposes.
And it would be very remiss of me not to state the importance in benchmarking! If you want to apply any of these techniques in an effort to speed up your program, benchmark first, and be certain that you are optimizing the right place!
Upvotes: 2