IEEE 754 Addition of two 32-bit floating point numbers (-1 and 2^(-50) )

Question

Consider the following piece of C++ Code:

#include 
#include 

using namespace std;

int main()
{
    cout.precision(1000000000);
    
    float a,b,c;
    
    a = 1;
    b = -1;
    c = pow(2, -50);
    
    cout << "a = " << a << endl;
    cout << "b = " << b << endl;
    cout << "c = " << c << endl;
    
    float ab = a + b;
    float bc = b + c;
    float abc = ab + c;
    float bca = bc + a;
    
    cout << "a + b = " << ab << endl;
    cout << "b + c = " << bc << endl;
    cout << "(a + b) + c = " << abc << endl;
    cout << "(b + c) + a = " << bca << endl;

    return 0;
}

Which yields the output:

a = 1
b = -1
c = 8.8817841970012523233890533447265625e-16
a + b = 0
b + c = -1
(a + b) + c = 8.8817841970012523233890533447265625e-16
(b + c) + a = 0

Why is b + c = -1?

I am not getting my head around this effect of the IEEE 754 standard.

To my understanding the exponent ranges from -126 to 127. (8 bit for the biased exponent with a bias of 127.)

So 2^(-50) is representable without an issue as is 1 or -1. Neither of them are subnormal (denormalized) numbers, if I understand the standard correctly.

But why does the addition of -1 + 2^(-50) result in -1, thus the smaller number being neglected?

Thanks in advance for any help!

VorpalSword · Accepted Answer

The IEEE 754 standard specifies 1 sign bit, 7 exponent bits and 24 bits for the mantissa. When performing addition, the mantissas of each number get normalized, so 2^-50 is 1 shifted right by 50 bits relative to 1. This causes it to fall outside of the 24 bit mantissa used for the result. You should try repeating your experiment with 2^-25 to prove this.

IEEE 754 Addition of two 32-bit floating point numbers (-1 and 2^(-50) )

Answers (2)

Related Questions