lppier
lppier

Reputation: 1987

Am I doing double to float conversion here

const double dBLEPTable_8_BLKHAR[4096] = {
  0.00000000000000000000000000000000,
  -0.00000000239150987901837200000000,
  -0.00000000956897738824125100000000,
  -0.00000002153888378764179400000000,
  -0.00000003830892270073604800000000,
  -0.00000005988800189093979000000000,
  -0.00000008628624126316708500000000,
  -0.00000011751498329992671000000000,
  -0.00000015358678995269770000000000,
  -0.00000019451544774895524000000000,
  -0.00000024031597312124120000000000,
  -0.00000029100459975062165000000000
} 

If I change the double above to float, am I doing incurring conversion cpu cycles when I perform operations on the array contents? Or is the "conversion" sorted out during compile time?

Say, dBLEPTable_8_BLKHAR[1] + dBLEPTable_8_BLKHAR[2] , something simple like this?

On a related note, how many trailing decimal places should a float be able to store?

This is c++.

Upvotes: 0

Views: 215

Answers (3)

4386427
4386427

Reputation: 44274

Ben Voigt answer addresses your question for most parts. But you also ask:

On a related note, how many trailing decimal places should a float be able to store

It depends on the value of the number you are trying to store. For large numbers there is no decimals - in fact the format can't even give you a precise value for the integer part. For instance:

float x = BIG_NUMBER;
float y = x + 1;
if (x == y)
{
    // The code get here if BIG_NUMBER is very high!
}
else
{
    // The code get here if BIG_NUMBER is no so high!
}

If BIG_NUMBER is 2^23 the next greater number would be (2^23 + 1).

If BIG_NUMBER is 2^24 the next greater number would be (2^24 + 2).

The value (2^24 + 1) can not be stored.

For very small numbers (i.e. close to zero), you will have a lot of decimal places.

Floating point is to be used with great care because they are very imprecise.

http://en.wikipedia.org/wiki/Single-precision_floating-point_format

For small numbers you can experiment with the program below.

Change the exp variable to set the starting point. The program will show you what the step size is for the range and the first four valid numbers.

int main (int argc, char* argv[])
{
    int exp = -27; // <---  !!!!!!!!!!!
                   // Change this to set starting point for the range
                   // Starting point will be 2 ^ exp

    float f;
    unsigned int *d = (unsigned int *)&f; // Brute force to set f in binary format
    unsigned int e;


    cout.precision(100);

    // Calculate step size for this range
    e = ((127-23) + exp) << 23;
    *d = e;
    cout << "Step size  = " << fixed << f << endl;
    cout << "First 4 numbers in range:" << endl;

    // Calculate first four valid numbers in this range
    e = (127 + exp) << 23;

    *d = e | 0x00000000;
    cout << hex << "0x" << *d << " = " << fixed << f << endl;

    *d = e | 0x00000001;
    cout << hex << "0x" << *d << " = " << fixed << f << endl;

    *d = e | 0x00000002;
    cout << hex << "0x" << *d << " = " << fixed << f << endl;

    *d = e | 0x00000003;
    cout << hex << "0x" << *d << " = " << fixed << f << endl;

    return 0;
}

For exp = -27 the output will be:

Step size  = 0.0000000000000008881784197001252323389053344726562500000000000000000000000000000000000000000000000000
First 4 numbers in range:
0x32000000 = 0.0000000074505805969238281250000000000000000000000000000000000000000000000000000000000000000000000000
0x32000001 = 0.0000000074505814851022478251252323389053344726562500000000000000000000000000000000000000000000000000
0x32000002 = 0.0000000074505823732806675252504646778106689453125000000000000000000000000000000000000000000000000000
0x32000003 = 0.0000000074505832614590872253756970167160034179687500000000000000000000000000000000000000000000000000

Upvotes: 1

Bill Lynch
Bill Lynch

Reputation: 81926

const double dBLEPTable_8_BLKHAR[4096] = {

If you change the double in that line to float, then one of two things will happen:

  1. At compile time, the compiler will convert the numbers -0.00000000239150987901837200000000 to the float that best represents them, and will then store that data directly into the array.

  2. At runtime, during the program initialization (before main() is called!) the runtime that the compiler generated will fill that array with data of type float.

Either way, once you get to main() and to code that you've written, all of that data will be stored as float variables.

Upvotes: 0

Ben Voigt
Ben Voigt

Reputation: 283634

Any good compiler will convert the initializers during compile time. However, you also asked

am I incurring conversion cpu cycles when I perform operations on the array contents?

and that depends on the code performing the operations. If your expression combines array elements with variables of double type, then the operation will be performed at double precision, and the array elements will be promoted (converted) before the arithmetic takes place.

If you just combine array elements with variables of float type (including other array elements), then the operation is performed on floats and the language doesn't require any promotion (But if your hardware only implements double precision operations, conversion might still be done. Such hardware surely makes the conversions very cheap, though.)

Upvotes: 2

Related Questions