user2971421
user2971421

Reputation: 11

rounding error of GNU C compiler

My textbook - C in a Nutshell, ISBN 978-0596006976

The part of casting, the code in an example showing C rounding error:

Code:

#include <stdio.h>

int
main()
{
  long l_var = 123456789L;
  float f_var = l_var;

  printf("The rounding error (f_var - l_var) is %f\n", f_var - l_var);

  return 0;
}

then the value it output with nothing but 0.000000

seems it made no precision problem while casting those literal

with gcc(v4.4.7) command

gcc -Wall file.c -o exec

did GNU make a better way to get around the problem which mentioned in that chapter

or just some setting not strictly close to the issue of rounding error?

Upvotes: 1

Views: 372

Answers (3)

AProgrammer
AProgrammer

Reputation: 52334

0 is the value you get if both values are converted to float, you'll get something else if they are converted to something else. And there is an allowance in the standard to use wider floating point representation that required by the type for computation (*). Using it here is especially tempting here as the result has to be converted to a double for passing to printf.

My version of gcc is not using that allowance when compiling for x86_64 (-m64 argument for gcc) and it is using it when compiling for x86 (-m32 argument). That make sense when you know that for 64 bits, it is using sse instructions which can easily do the computation in float, while when compiling for 32 bits it is using the older "8087" stack model which can't do that easily.


(*) Last paragraph of 6.2.1.5 in C90, 6.3.1.8/2 in C99, 6.3.1.8/2 in C11. I give the text of the latest (as in n1539)

The values of floating operands and of the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

As pointed by Pascal Cuoq, starting from C99, you can test with FLT_EVAL_METHOD.

Upvotes: 0

Ali
Ali

Reputation: 58501

I don't have access to this book.

My guess is that the example is trying to tell you that if you assign a 32 bit integer to a 32 bit float, you may lose bits due to truncation (rounding errors): A 32 bit float has only 23 bit significand and some bits may be lost during the assignment accordingly.

Apparently, the example code is bogus in the book though. Here is the code to demonstrate the truncation error:

#include <stdint.h>
#include <stdio.h>

int main() {

  int32_t l_var = 123456789L;

  /* 32 bit variable, 23 bit significand, approx. 7 decimals  */
  float f_var = l_var; 

  double err = (double) f_var - (double) l_var;

  printf("The rounding error (f_var - l_var) is %f\n", err);

  return 0;
}

This prints

The rounding error (f_var - l_var) is 3.000000

on my machine.

Upvotes: 2

Guilherme Bernal
Guilherme Bernal

Reputation: 8313

I don't know what this chapter is telling you, but:

float f_var = l_var;

We can tell that f_var is (float)l_var. Now the expression:

f_var - l_var

As this operates on a long and a float, the long will be converted into a float. So the compiler will do:

f_var - (float)l_var

Which is the same as:

(float)l_var - (float)l_var

Which is zero, regardless of any rounding of the conversion.

Upvotes: 4

Related Questions