Reputation: 4752
I have some blocks of code that do:
float total = <some float>;
double some_dbl = <some double>;
total *= some_dbl;
This elicits a compiler warning which I want to shut up, but I don't like turning off such warnings - instead, I would rather explicitly cast types as needed. Which got me thinking...is a (float)(total * some_dbl)
more accurate than total * (float)some_dbl
? Is it compiler or platform specific?
Better code example (linked below):
#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;
int main() {
double d_total = 1.2345678;
float f_total = (float)d_total;
double some_dbl = 6.7809123;
double actual = (d_total * some_dbl);
float no_cast = (float)(f_total * some_dbl);
float with_cast = (float)(f_total * (float)some_dbl);
cout << "actual: " << setprecision(25) << actual << endl;
cout << "no_cast: " << setprecision(25) << no_cast << endl;
cout << "with_cast: " << setprecision(25) << with_cast << endl;
cout << "no_cast, nextafter: " << setprecision(25) << nextafter(no_cast, 500.0f) << endl;
cout << endl;
cout << "Diff no_cast: " << setprecision(25) << actual - no_cast << endl;
cout << "Diff with_cast: " << setprecision(25) << with_cast - actual << endl;
return 0;
}
Edit:
So, I gave this a shot. With the examples I tried, I did find one quickly where total * (float)(some_dbl)
appears to be more accurate. I assume this isn't going to always be the case, but is instead luck of the draw, or the compiler is truncating doubles to get to float, rather than rounding, causing potentially worse results. See: http://ideone.com/sRXj1z
Edit 2: I confirmed using std::nextafter
that (float)(total * some_dbl)
is returning the truncated value, and updated the linked code. It is quite surprising: if the compiler in this case is always truncating doubles, than you can say (float)some_dbl <= some_dbl
, which then implies with_cast <= no_cast
. However, this is not the case! with_cast
is not only greater than no_cast
, but it is closer to the actual value as well, which is kinda surprising, given that we are discarding information before the multiplication occurs.
Upvotes: 11
Views: 888
Reputation: 141618
Based on the figures from your code dump, two adjacent possible values of float
are:
d1 = 8.37149524...
d2 = 8.37149620...
The result of doing the multiplication in double precision is:
8.37149598...
which is in between those two, of course. Converting this result to float
is implementation-defined as to whether it "rounds" up or down. In your code results, the conversion has selected d1
, which is permitted, even though it is not the closest. The mixed-precision multiplication ended up with d2
.
So we can conclude, somewhat unintuitively, that doing a calculation of doubles in double precision and then converting to float
is in some cases less accurate than doing it entirely in float
precision!
Upvotes: 1
Reputation: 30001
It will make a difference depending on the size of the numbers involved, because double
is not just about more precision but can also hold numbers larger than float
. Here's a sample that will show one such instance:
double d = FLT_MAX * 2.0;
float f = 1.0f / FLT_MAX;
printf("%f\n", d * f);
printf("%f\n", (float)d * f);
printf("%f\n", (float)(d * f));
And the output:
2.000000
inf
2.000000
This happens because while float
can obviously hold the result of the computation -- 2.0
, it can not hold the intermediate value of FLT_MAX * 2.0
Upvotes: 10
Reputation: 1837
I tested it and they aren't equal. The result of the below is true
. http://codepad.org/3GytxbFK
#include <iostream>
using namespace std;
int main(){
double a = 1.0/7;
float b = 6.0f;
float c = 6.0f;
b = b * (float)a;
c = (float)((double)c * a);
cout << (b-c != 0.0f) << endl;
return 0;
}
This leads me to reason: The cast from the result of the multiplication expressed as a double
to a float
will have a better chance to round. Some bits can fall off the end with the float
multiplication that would have been correctly accounted for when the multiplication is carried out on double
s then casted to float
.
BTW, I chose 1/7*6 because it repeats in binary.
Edit: Upon research, it seems the rounding should be the same for both conversion from double to float and for multiplication of floats, at least in an implementation conforming to IEEE 754. https://en.wikipedia.org/wiki/Floating_point#Rounding_modes
Upvotes: 1
Reputation: 141
If you do an operation then the compiler converts the variables into the biggest datatype of that operation. Here it is double. In my opinion the operation: (float)(var1f * var2) has more accuracy.
Upvotes: 2