Reputation: 923
Suppose I have some variable to double precision, x, that I read from a binary file. When printed this variable x = 384.6433096437924
I want to 'bump' it to quad precision.
integer, parameter :: qp = selected_real_kind(32,307)
integer, parameter :: dp = selected_real_kind(15,307)
real(kind=dp) :: x_double
real(kind=qp) :: x_quad
x_double = 384.6433096437924
x_quad = real(x_double, kind=dp)
print *, x_double, x_quad, 384.64330964379240_qp
This outputs,
384.6433096437924, 384.64330964379240681882947683334351, 384.643309643792400000000000000000009
Which of the quad numbers is the 'best' - i.e. most accurate, consistent etc. value to use? The one with 'junk' or the one without?
I understand (I think!) the issues relating to the exact representation of a float on a computer, but cannot see which is the way to go.
I compile as gfortran -fdefault-real-8
Upvotes: 1
Views: 434
Reputation: 224335
The one with what you call “junk” has the least loss of information from the double
value.
When 384.6433096437924 is converted to double-precision floating-point, the result is a value in a binary-based format, and that value is different from the original value. If you know the original value had 16 significant decimal digits, then printing the double
with 16 decimal digits can produce the original value (in most cases), and that makes sense.
However, if the number has been through various computations, and there is no reason to believe it represents a number that has some exact number of decimal digits, then truncating or altering its value for display does not provide any useful information. It may make it easier for a human to read, but the entire computed value is the best information we have about the result, so printing the entire value conveys the most information.
Alternatively, if the original number can be converted directly to quad precision without going through double precision, then the error from the conversion will generally be less, and this method should be preferred.
Ultimately, which form is best for display depends on the purposes of the display. Do you want to preserve the most information for future use? Or do you want to make the result easily readable by a human?
Finally, the “junk” you show is strange. The result of converting 384.6433096437924 to IEEE-754 basic 64-bit binary floating-point should be 384.64330964379240640482748858630657196044921875, but you show 384.64330964379240681882947683334351. I do not see how that value arose.
Upvotes: 1