Reputation: 11
My textbook - C in a Nutshell, ISBN 978-0596006976
The part of casting, the code in an example showing C rounding error:
Code:
#include <stdio.h>
int
main()
{
long l_var = 123456789L;
float f_var = l_var;
printf("The rounding error (f_var - l_var) is %f\n", f_var - l_var);
return 0;
}
then the value it output with nothing but 0.000000
seems it made no precision problem while casting those literal
with gcc(v4.4.7) command
gcc -Wall file.c -o exec
did GNU make a better way to get around the problem which mentioned in that chapter
or just some setting not strictly close to the issue of rounding error?
Upvotes: 1
Views: 372
Reputation: 52334
0 is the value you get if both values are converted to float, you'll get something else if they are converted to something else. And there is an allowance in the standard to use wider floating point representation that required by the type for computation (*). Using it here is especially tempting here as the result has to be converted to a double for passing to printf.
My version of gcc is not using that allowance when compiling for x86_64 (-m64 argument for gcc) and it is using it when compiling for x86 (-m32 argument). That make sense when you know that for 64 bits, it is using sse instructions which can easily do the computation in float, while when compiling for 32 bits it is using the older "8087" stack model which can't do that easily.
(*) Last paragraph of 6.2.1.5 in C90, 6.3.1.8/2 in C99, 6.3.1.8/2 in C11. I give the text of the latest (as in n1539)
The values of floating operands and of the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.
As pointed by Pascal Cuoq, starting from C99, you can test with FLT_EVAL_METHOD.
Upvotes: 0
Reputation: 58501
I don't have access to this book.
My guess is that the example is trying to tell you that if you assign a 32 bit integer to a 32 bit float, you may lose bits due to truncation (rounding errors): A 32 bit float has only 23 bit significand and some bits may be lost during the assignment accordingly.
Apparently, the example code is bogus in the book though. Here is the code to demonstrate the truncation error:
#include <stdint.h>
#include <stdio.h>
int main() {
int32_t l_var = 123456789L;
/* 32 bit variable, 23 bit significand, approx. 7 decimals */
float f_var = l_var;
double err = (double) f_var - (double) l_var;
printf("The rounding error (f_var - l_var) is %f\n", err);
return 0;
}
This prints
The rounding error (f_var - l_var) is 3.000000
on my machine.
Upvotes: 2
Reputation: 8313
I don't know what this chapter is telling you, but:
float f_var = l_var;
We can tell that f_var
is (float)l_var
. Now the expression:
f_var - l_var
As this operates on a long
and a float
, the long
will be converted into a float
. So the compiler will do:
f_var - (float)l_var
Which is the same as:
(float)l_var - (float)l_var
Which is zero, regardless of any rounding of the conversion.
Upvotes: 4