double rounding

Question

I this is code that rounds 62 to 61 and shows it in the output. Why it decides to round and how to get 62 in the output?

var  d: double;
i: integer;
begin
  d:=0.62;
  i:= trunc(d*100);
  Showmessage( inttostr(i) );

end;

David Heffernan · Accepted Answer

This boils down to the fact that 0.62 is not exactly representable in a binary floating point data type. The closest representable double value to 0.62 is:

0.61999 99999 99999 99555 91079 01499 37383 83054 73327 63671 875

When you multiply this value by 100, the resulting value is slightly less that 62. What happens next depends on how the intermediate value d*100 is treated. In your program, under the 32 bit Windows compiler with default settings, the intermediate value is held in an 80 bit extended register. And the closest 80 bit extended precision value is:

61.99999 99999 99999 55591 07901 49937 38383 05473 32763 67187 5

Since the value is less than 62, Trunc returns 61 since Trunc rounds towards zero.

If you stored d*100 in a double value, then you'd see a different result.

d := 0.62;
d := d*100;
i := Trunc(d);
Writeln(i);

This program outputs 62 rather than 61. That's because although d*100 to extended 80 bit precision is less than 62, the closest double precision value to that 80 bit value is in fact 62.

Similarly, if you compile your original program with the 64 bit compiler, then arithmetic is performed in the SSE unit which has no 80 bit registers. And so there is no 80 bit intermediate value and your program outputs 62.

Or, going back to the 32 bit compiler, you can arrange that intermediate values are stored to 64 bit precision on the FPU and also achieve an output of 62. Call Set8087CW($1232) to achieve that.

As you can see, binary floating point arithmetic can sometimes be surprising.

If you use Round rather than Trunc then the value returned will be the closest integer, rather than rounding towards zero as Trunc does.

But perhaps a better solution would be to use a decimal data type rather than a binary data type. If you do that then you can represent 0.62 exactly and thereby avoid all such problems. Delphi's built in decimal real valued data type is Currency.

double rounding

Answers (2)

Related Questions