Wayne
Wayne

Reputation: 2989

Convert mantissa and exponent into double

In a very high performance app we find the the CPU can calculate long arithmetic significantly faster then with doubles. However, in our system it was determined that we never need more then 9 decimal places of precision. So we using longs for all floating point arithmetic with a 9 point precision understood.

However, in certain parts of the system it is more convenient due to readability to work with doubles. So we have to convert between the long value that assumes 9 decimal places into double.

We find the simply taking the long and dividing by 10 to the power of 9 or multiplying by 1 divided by 10 to the power of 9 gives imprecise representations in a double.

To solve that we using the Math.Round(value,9) to give the precise values.

However, Math.Round() is horrifically slow for performance.

So our idea at the moment is to directly convert the mantissa and exponent to the binary format of a double since--in that way, there will be zero need for rounding.

We have learned online how to examine bits of a double to get the mantissa and exponent but it's confusing to figure out how to reverse that to take a mantissa and exponent and fabricate a double by using the bits.

Any suggestions?

[Test]
public unsafe void ChangeBitsInDouble()
{
    var original = 1.0D;
    long bits;
    double* dptr = &original;
    //bits = *(long*) dptr;
    bits = BitConverter.DoubleToInt64Bits(original);
    var negative = (bits < 0);
    var exponent = (int) ((bits >> 52) & 0x7ffL);
    var mantissa = bits & 0xfffffffffffffL;
    if( exponent == 0)
    {
        exponent++;
    }
    else
    {
        mantissa = mantissa | (1L << 52);
    }
    exponent -= 1075;

    if( mantissa == 0)
    {
        return;
    }

    while ((mantissa & 1) == 0)
    {
        mantissa >>= 1;
        exponent++;
    }

    Console.WriteLine("Mantissa " + mantissa + ", exponent " + exponent);

}

Upvotes: 8

Views: 4335

Answers (2)

Jon Hanna
Jon Hanna

Reputation: 113322

As you've already realised as per the other answer, doubles work by floating-point binary rather than floating-point decimal, and therefore the initial approach doesn't work.

It's also not clear if it could work with a deliberately simplified formula, because it's not clear what the maximum range you need is, so rounding becomes inevitable.

The problem of doing so quickly but precisely is well-studied and often supported by CPU instructions. Your only chance of beating the built-in conversions is either:

  1. You hit a mathematical breakthrough that's worthy of some serious papers being written about it.
  2. You exclude enough cases that won't occur in your own examples that while the built-ins are better generally yours is optimised for your own use.

Unless the range of values you use is very limited, the potential for short-cutting on conversion between double-precision IEEE 754 and long integer becomes smaller and smaller.

If you're at the point where you have to cover most of the cases IEEE 754 covers, or even a sizable proportion of them, then you'll end up making things slower.

I'd recommend either staying with what you have, moving the cases where double is more convenient to stick with long anyway despite the inconvenience, or if necessary using decimal. You can create a decimal from a long easily with:

private static decimal DivideByBillion (long l)
{
  if(l >= 0)
   return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, false, 9);
  l = -l;
  return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, true, 9);
}

Now, decimal is magnitudes slower to use in arithmetic than double (precisely because it implements an approach similar to yours in the opening question, but with a varying exponent and larger mantissa). But if you need just a convenient way to obtain a value for display or rendering to string, then hand-hacking the conversion to decimal has advantages over hand-hacking the conversion to double, so it could be worth looking at.

Upvotes: 0

erikkallen
erikkallen

Reputation: 34411

You shouldn't use a scale factor of 10^9, you should use 2^30 instead.

Upvotes: 1

Related Questions