Reputation: 289

Are there any numbers that enable fast modulo calculation on floats?

I know that for unsigned integers, I can replace the modulo operation with a bitmask, if the divisor is a power of two. Do any numbers have a similar property for floats? That is, are there any numbers n for which f mod n can be calculated more efficiently than the in general case, not necessarily using a bitmask?

~~Other than, of course, one.~~ Brain faliure

Edit: to clarify, f is any floating point number (determined at runtime), n is any compile-time constant number in any format and I expect the result to be a float.

Upvotes: 6

Answers (4)

Eric Postpischil

Reputation: 224312

The mathematics works the same for floating-point types as it does for integer types: If n is a power of the radix (two for binary), then f modulo n can be computed by zeroing digits representing values n or greater (also known as the high bits or high digits).

So, for a binary integer with bits b₁₅ b₁₄ b₁₃ b₁₂ b₁₁ b₁₀ b₉ b₈ b₇ b₆ b₅ b₄ b₃ b₂ b₁ b₀, we can compute the residue modulo four simply by setting b₁₅ to b₂ to zero, leaving only b₁ b₀.

Similarly, if the radix of the floating-point format is two, we can compute the residue modulo four by removing all digits whose value is four or greater. This does not require a division, but it does require examining the bits representing the value. A simple bit mask alone will not suffice.

The C standard characterizes a floating-point type as a sign (±1), a base b, an exponent, and some number of base b digits. Thus, if we know the format a particular C implementation uses to represent a floating-point type (the way that the sign, exponent, and digits are encoded into bits), an algorithm for calculating f modulo n, where n is a power of b, is:

Let y = f.
Use the difference between the exponent of y and the exponent of n to decide which digits in y have position values less than n.
Change those digits to zero.
Return f − y.

Some notes:

The algorithm has to handle infinities, NaNs, subnormals, and other special cases.
The purpose of zeroing low digits in the copy y and subtracting from f rather than zeroing high digits directly in f is to avoid the need to zero the implicit bit in the IEEE 754 format. (In the algorithm as stated, if the implicit bit in y needs to be zeroed, then all of y is zeroed, so it is easy.)
Although no division is used, the manipulations are not as simplistic as a bit mask and are unlikely to be useful generally. However, there are special situations, often with known values of d and limited values of x, where such bit manipulations of floating-point representations are useful.

Sample code:

//  This code assumes double is IEEE 754 basic 64-bit binary floating-point.

#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>


//  Return the bits representing the double x.
static uint64_t Bits(double x)
    { return (union { double d; uint64_t u; }) { x } .u; }


//  Return the double represented by bits x.
static double Double(uint64_t x)
    { return (union { uint64_t u; double d; }) { x } .d; }


//  Return x modulo 2**E.
static double Mod(double x, int E)
{
    uint64_t b = Bits(x);
    int      e = b >> 52 & 0x7ff;

    //  If x is a NaN, return it.
    if (x != x) return x;

    //  Is x is infinite, return a NaN.
    if (!isfinite(x)) return NAN;

    //  If x is subnormal, adjust its exponent.
    if (e == 0) e = 1;

    //  Remove the encoding bias from e and get the difference in exponents.
    e = (e-1023) - E;

    //  Calculate number of bits to keep.  (Could be consolidated above, kept for illustration.)
    e = 52 - e;

    if (e <= 0) return 0;
    if (53 <= e) return x;

    //  Remove the low e bits (temporarily).
    b = b >> e << e;

    /*  Convert b to a double and subtract the bits we left in it from the
        original number, thus leaving the bits that were removed from b.
    */
    return x - Double(b);
}


static void Try(double x, int E)
{
    double expected = fmod(x, scalb(1, E));
    double observed = Mod(x, E);

    if (expected == observed)
        printf("Mod(%a, %d) = %a.\n", x, E, observed);
    else
    {
        printf("Error, Mod(%g, %d) = %g, but expected %g.\n",
            x, E, observed, expected);
        exit(EXIT_FAILURE);
    }
}


int main(void)
{
    double n = 4;

    //  Calculate the base-two logarithm of n.
    int E;
    frexp(n, &E);
    E -= 1;

    Try(7, E);
    Try(0x1p53 + 2, E);
    Try(0x1p53 + 6, E);
    Try(3.75, E);
    Try(-7, E);
    Try(0x1p-1049, E);
}

Upvotes: 6

Luis Colorado

Reputation: 12708

Yes, when they are powers of two, there's also the possibility of doing modulo operations with a pseudo-mask, but in this case, we have to consider the fact that floating point numbers are formatted following IEEE-754 standard.

Let's assume we do the same operation, but this time, as numbers are real numbers, the power of two number will be a 1 bit followed by an infinite number of 0s.

1000000000000000000000000... * 2^exp

to get the mask, we do the same thing as with integers... change the 1 with a 0 and all the bits following that digit change to 1s.

0111111111111111111111111... * 2^exp  =
1111111111111111111111111... * 2^(exp-1)
(THIS NUMBER IS (1.0 - FLT_EPSILON) * 2^(exp_of_module - 1))

but this is an all ones mask so the mantissa is never touched, except for the bits that are over the modulo number (which go to zero). When we zero all this bits, we don't need to do anything but shifting them to the left and let them drop into the waste basket, because they are masked out. So the masking of the mantissa is always a left shift (filling with more digits on the right from the number --- oops, we don't have them, so fill with zeros/random bits/ones...) and then normalize the number (this means to shift the number until the first significative one gets to the first place and the exponent bias is adjusted coordinately)

Let's see an example: We have 2.0 as modulus, and M_PI(3.141592...) as the number to mask on:

3.141592... = 40 49 0f db
            = 0100 0000 0100 1001 0000 1111 1101 1011...
; which represents
            = 0 (positive)
               100 0000 0 (exponent biased 128 ==> 1)
                     (1.)100 1001 0000 1111 1101 1011...
                      ^ Allways 1 so IEEE-754 doesn't include, we have to do, to operate.
                       11.00 1001 0000 1111 1101 1011...
                       01.11 1111 1111 1111 1111 1111...
             MASKED == 01.00 1001 0000 1111 1101 1011...
                       / // //// //// //// //// ////,-- introduced to complete format
       RENORMALIZED == 1.001 0010 0001 1111 1011 011X...
; result    = 0 (positive)
            =  011 1111 1 (new exponent after norm. 127 ==> 0)
                       1.001 0010 0001 1111 1011 011X
            = 0011 1111 1001 0010 0001 1111 1011 011X

which is 1.141592... as expected.

As you see, we have to check the difference between the biased exponents, and left shift the mantissa as many places as that difference indicates. At the same time, we need to substract that difference to the biased exponent (and check for underflow or subnormal cases) and renormalize the number (left shift the mantissa, and decrement exponent, until the firs one in the mantissa goes to the hidden bit place)

NOTE

I assumed the biased exponent of the modulo is lower than the one of the number to be operated on, as in the other case, the mask is all ones, and the number is not affected by the mask.

Upvotes: 0

Simon Byrne

Reputation: 7874

If n == 1.0 or n == -1.0, then you can do:

r = f - trunc(f);

On x86_64, trunc will typically use the ROUNDSD instruction, so this will be pretty fast.

If n is a power of 2 with magnitude greater than or equal to 1, and your platform has a fma function that is native (for Intel, this means Haswell or newer), then you could do

r = fma(-trunc(f / n), n, f);

Any reasonable compiler should switch the division to a multiplication, and fold the negation into the appropriate FMA (or the constant), resulting in a multiplication, a truncation and an FMA.

This can also work for smaller powers of 2, as long as the result doesn't overflow (so the compiler wouldn't be free to substitute it).

Whether any compilers will actually do this is another matter. Floating point remainder functions aren't used much, and don't get much attention from compiler writers, e.g. https://bugs.llvm.org/show_bug.cgi?id=3359

Upvotes: 5

user1902824

Reputation:

In general, no. However, given the "fuzzy" nature of floating point (at least IEEE 754), there are ~~abuses~~ tricks that can radically speed up certain calculations by approximations, at the cost of memory, precision, portability, maintainability, or a combination of these.

The simplest approach is to precalculate the operation and store the result into a lookup table before runtime. The larger you make the table, the more memory you use, but you also gain more precision.

A more unique approach dabbles with the floating point representation in memory. One of the more famous floating point hacks is the fast inverse square root. By these concepts, the general idea is that you can make an approximation function for just about anything you want, not just inverse square roots. If you know your input range and your tolerance for error, you could construct such an algorithm that is tuned precisely for your purposes.

And it would be very remiss of me not to state the importance in benchmarking! If you want to apply any of these techniques in an effort to speed up your program, benchmark first, and be certain that you are optimizing the right place!

Upvotes: 2

Are there any numbers that enable fast modulo calculation on floats?

Answers (4)

NOTE

Related Questions