aka.nice
aka.nice

Reputation: 9382

should ldexp round correctly

I'm a bit surprised with MSVC ldexp behavior (it happens in Visual Studio 2013, but also with all older versions at least down to 2003...).

For example:

#include <math.h>
#include <stdio.h>
int main()
{
    double g=ldexp(2.75,-1074);
    double e=ldexp(3.0,-1074);

    printf("g=%g e=%g \n",g,e);
    return 0;
}

prints

g=9.88131e-324 e=1.4822e-323

The first one g is strangely rounded...
It is 2.75 * fmin_denormalized, so i definitely expect the second result e.
If I evaluate 2.75*ldexp(1.0,-1074) I correctly get same value as e.

Are my expectations too high, or does Microsoft fail to comply with some standard?

Upvotes: 6

Views: 502

Answers (3)

1201ProgramAlarm
1201ProgramAlarm

Reputation: 32732

This happens because during the ldexp calculation the 2.75 gets truncated to 2, which happens because at that small of a denormalized number the bits that represent the '.75' part get shifted off the end of the representable number and disappear. Whether this is a bug or designed behavior can be debated.

When calculating 2.75*ldexp(1.0,-1074) normal rounding happens, and the 2.75 becomes 3.

EDIT: ldexp should round correctly, and this is a bug.

Upvotes: 2

njuffa
njuffa

Reputation: 26105

While the question does not explicitly state this, I assume that the output expected by the asker is:

g=1.4822e-323 e=1.4822e-323

This is what we would expect from a C/C++ compiler that promises strict adherence to IEEE-754. The question is tagged both C and C++, I will address C99 here as that is the standard I have in hand.

In Annex F, which describes IEC 60559 floating-point arithmetic (where IEC 60559 is basically another name for IEEE-754) the C99 standard specifies:

An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex. [...] The scalbn and scalbln functions in <math.h> provide the scalb function recommended in the Appendix to IEC 60559.

Further down in that annex, section F.9.3.6 specifies:

On a binary system, ldexp(x, exp) is equivalent to scalbn(x, exp).

The appendix referenced by the C99 standard is the appendix of the 1985 version of IEEE-754, where we find the scalb function defined as follows:

Scalb(y, N) returns y × 2N for integral values N without computing 2N.

scalb is defined as a multiplication with a power of two, and multiplications must be rounded correctly based on the current rounding mode according to the standard. Therefore, with a conforming C99 compiler ldexp() must return a correctly rounded result if the compiler defines __STDC_IEC_559__. In the absence of a library call setting the rounding mode, the default rounding mode "round to nearest or even" is in effect.

I do not have access to MSVC 2013, so I do not know whether it defines that symbol or not. This could even depend on a compiler flag setting, such as /fp:strict.

After tracking down my copy of the C++11 standard, I cannot find any reference to __STDC_IEC_559__ or any language about IEEE-754 bindings. According to the answer to this question this is because that support is included by referring to the C99 standard.

Upvotes: 6

chux
chux

Reputation: 153517

OP results do not fail to comply with the C spec as that spec does not define the preciseness of calculations.

OP result may have failed IEEE 754, but it depends on the rounding mode in use at the time, which is not posted. Yet OP's reports 2.75*ldexp(1.0,-1074) worked as expected implying at that time, the expected rounding mode was in effect.

Using printf("%la",x) aids in seeing clearly what is happening near the limits of double.

I would expect g to "round to nearest - ties to even" with the result of 0x1.8p-1073 - which did occur with gcc on windows.

Ideally g would have the value of 0x1.6p-1073

0x0.0p-1073 Zero
0x0.8p-1073 next higher double DBL_TRUE_MIN
0x1.0p-1073 next higher double
0x1.6p-1073 ideal `g` answer, but not available as a double
0x1.8p-1073 next higher double

To be fair, it could be a processor bug - it has happened before.


Ref

double g=ldexp(2.75,-1074);
printf("%la\n%la\n", 2.75,ldexp(2.75,-1074));
printf("%la\n%la\n", 3.0 ,ldexp(3.0 ,-1074));
double e=ldexp(3.0,-1074);
printf("%la\n%la\n", g,e);
printf("%la\n%la\n", 9.88131e-324, DBL_TRUE_MIN);
printf("g=%g e=%g \n",g,e);

0x1.6p+1
0x1.8p-1073
0x1.8p+1
0x1.8p-1073
0x1.8p-1073
0x1.8p-1073
0x1p-1073
0x1p-1074
g=1.4822e-323 e=1.4822e-323 

Upvotes: 0

Related Questions