Rollie
Rollie

Reputation: 4752

Is (float)(1.2345f * 6.7809) more accurate than 1.2345f * 6.7809f?

I have some blocks of code that do:

float total = <some float>;
double some_dbl = <some double>;

total *= some_dbl;

This elicits a compiler warning which I want to shut up, but I don't like turning off such warnings - instead, I would rather explicitly cast types as needed. Which got me thinking...is a (float)(total * some_dbl) more accurate than total * (float)some_dbl? Is it compiler or platform specific?

Better code example (linked below):

#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;

int main() {
    double d_total = 1.2345678;
    float f_total = (float)d_total;
    double some_dbl = 6.7809123;

    double actual = (d_total * some_dbl);
    float no_cast = (float)(f_total * some_dbl);
    float with_cast = (float)(f_total * (float)some_dbl);

    cout << "actual:               " << setprecision(25) << actual << endl;
    cout << "no_cast:              " << setprecision(25) << no_cast << endl;
    cout << "with_cast:            " << setprecision(25) << with_cast << endl;
    cout << "no_cast, nextafter:   " << setprecision(25) << nextafter(no_cast, 500.0f) << endl;

    cout << endl;

    cout << "Diff no_cast:   " << setprecision(25) << actual - no_cast << endl;
    cout << "Diff with_cast: " << setprecision(25) << with_cast - actual << endl;
    return 0;
}

Edit: So, I gave this a shot. With the examples I tried, I did find one quickly where total * (float)(some_dbl) appears to be more accurate. I assume this isn't going to always be the case, but is instead luck of the draw, or the compiler is truncating doubles to get to float, rather than rounding, causing potentially worse results. See: http://ideone.com/sRXj1z

Edit 2: I confirmed using std::nextafter that (float)(total * some_dbl) is returning the truncated value, and updated the linked code. It is quite surprising: if the compiler in this case is always truncating doubles, than you can say (float)some_dbl <= some_dbl, which then implies with_cast <= no_cast. However, this is not the case! with_cast is not only greater than no_cast, but it is closer to the actual value as well, which is kinda surprising, given that we are discarding information before the multiplication occurs.

Upvotes: 11

Views: 888

Answers (4)

M.M
M.M

Reputation: 141618

Based on the figures from your code dump, two adjacent possible values of float are:

        d1 =  8.37149524...
        d2 =  8.37149620...

The result of doing the multiplication in double precision is:

              8.37149598...

which is in between those two, of course. Converting this result to float is implementation-defined as to whether it "rounds" up or down. In your code results, the conversion has selected d1, which is permitted, even though it is not the closest. The mixed-precision multiplication ended up with d2.

So we can conclude, somewhat unintuitively, that doing a calculation of doubles in double precision and then converting to float is in some cases less accurate than doing it entirely in float precision!

Upvotes: 1

Cory Nelson
Cory Nelson

Reputation: 30001

It will make a difference depending on the size of the numbers involved, because double is not just about more precision but can also hold numbers larger than float. Here's a sample that will show one such instance:

double d = FLT_MAX * 2.0;
float f = 1.0f / FLT_MAX;

printf("%f\n", d * f);
printf("%f\n", (float)d * f);
printf("%f\n", (float)(d * f));

And the output:

2.000000
inf
2.000000

This happens because while float can obviously hold the result of the computation -- 2.0, it can not hold the intermediate value of FLT_MAX * 2.0

Upvotes: 10

Tyler
Tyler

Reputation: 1837

I tested it and they aren't equal. The result of the below is true. http://codepad.org/3GytxbFK

#include <iostream>

using namespace std;

int main(){
  double a = 1.0/7;
  float b = 6.0f;
  float c = 6.0f;
  b = b * (float)a;
  c = (float)((double)c * a);
  cout << (b-c != 0.0f) << endl;
  return 0;
}

This leads me to reason: The cast from the result of the multiplication expressed as a double to a float will have a better chance to round. Some bits can fall off the end with the float multiplication that would have been correctly accounted for when the multiplication is carried out on doubles then casted to float.

BTW, I chose 1/7*6 because it repeats in binary.

Edit: Upon research, it seems the rounding should be the same for both conversion from double to float and for multiplication of floats, at least in an implementation conforming to IEEE 754. https://en.wikipedia.org/wiki/Floating_point#Rounding_modes

Upvotes: 1

Patrick
Patrick

Reputation: 141

If you do an operation then the compiler converts the variables into the biggest datatype of that operation. Here it is double. In my opinion the operation: (float)(var1f * var2) has more accuracy.

Upvotes: 2

Related Questions