Exact representation of integers in floating points

Question

I am trying to understand the representation of integers in floating point format.

Since the IEEE floating point format have only 23 bits for mantissa, i expect any integer which is greater than 1<<22 to be only a approx representation. This is not what i am observing in g++

both of the cout below prints the same value 33554432.

Since the mantissa part is the one which is responsible for the precision how can we be able to represent (store) exact number which need more than 23 bits to be stored accurately.

void floating_point_precision(){
  cout<< setprecision(10);
  float fp = (1<<25);
  cout<< fp <



As a followup based on the answer below why is the following code not executing "Not Equal" even though the print of both the fp,i are different.

void floating_point_precision(){
  cout<< setprecision(10);
  float fp = ((1<<25)+1);
  cout<< fp <

Keith Thompson · Accepted Answer

It's true that IEEE floating-point only has a limited number of mantissa bits. If there are 23 mantissa bits, then it can represent 2²³ distinct integer values exactly.

But since floating-point stores a power-of-two exponent separately, it can (subject to the limited exponent range) represent exactly any of those 2²³ values times a power of two.

33554432 is exactly 2²⁵, so it requires just one mantissa bit to represent it exactly (plus a binary exponent that denotes multiplication by a power of two). Its binary representation is 10000000000000000000000000, which has 26 bits but only 1 significant bit. (Well, actually they're all significant, but you get the idea.)

You'll find that its neighboring integer values 33554431 and 33554433 cannot be represented exactly in 32-bit float. (But they can be represented in 64-bit double.)

More generally, the difference between consecutive representable values of type float varies with the magnitude of the value. On my system (most systems use IEEE format, but the standard doesn't require that), this program:

#include 
#include 
#include 

void show(float f) {
    std::cout << std::nextafterf(f, 0.0) << "
"
              << f << "
"
              << std::nextafterf(f, f*2) << "
";
    putchar('
');
}

int main(void) {
    std::cout << std::setprecision(24);

    show(1);
    show(1<<23);
    show(1<<24);
    show(1<<30);
}

produces this output:

0.999999940395355224609375
1
1.00000011920928955078125

8388607.5
8388608
8388609

16777215
16777216
16777218

1073741760
1073741824
1073741952

It shows the immediate predecessor and successor, in type float, of the numbers 1, 2²³, 2²⁴, and 2³⁰. As you can see, the gaps get bigger for larger numbers, with the gap doubling in size at each power of 2.

You'd get similar results, but with smaller gaps, with type double or long double.

Exact representation of integers in floating points

Answers (1)

Related Questions