Reputation: 4818
I am trying to understand the representation of integers in floating point format.
Since the IEEE floating point format have only 23 bits for mantissa, i expect any integer which is greater than 1<<22 to be only a approx representation. This is not what i am observing in g++
both of the cout below prints the same value 33554432.
Since the mantissa part is the one which is responsible for the precision how can we be able to represent (store) exact number which need more than 23 bits to be stored accurately.
void floating_point_precision(){
cout<< setprecision(10);
float fp = (1<<25);
cout<< fp <<endl;
cout<< (1<<25) <<endl;
}
As a followup based on the answer below why is the following code not executing "Not Equal" even though the print of both the fp,i are different.
void floating_point_precision(){
cout<< setprecision(10);
float fp = ((1<<25)+1);
cout<< fp <<endl;
int i = ((1<<25)+1) ;
cout<< i <<endl;
if(i != fp)
cout<< "Not equal" <<endl;
}
Upvotes: 3
Views: 4352
Reputation: 263647
It's true that IEEE floating-point only has a limited number of mantissa bits. If there are 23 mantissa bits, then it can represent 223 distinct integer values exactly.
But since floating-point stores a power-of-two exponent separately, it can (subject to the limited exponent range) represent exactly any of those 223 values times a power of two.
33554432
is exactly 225, so it requires just one mantissa bit to represent it exactly (plus a binary exponent that denotes multiplication by a power of two). Its binary representation is 10000000000000000000000000
, which has 26 bits but only 1 significant bit. (Well, actually they're all significant, but you get the idea.)
You'll find that its neighboring integer values 33554431
and 33554433
cannot be represented exactly in 32-bit float
. (But they can be represented in 64-bit double
.)
More generally, the difference between consecutive representable values of type float
varies with the magnitude of the value. On my system (most systems use IEEE format, but the standard doesn't require that), this program:
#include <iostream>
#include <iomanip>
#include <cmath>
void show(float f) {
std::cout << std::nextafterf(f, 0.0) << "\n"
<< f << "\n"
<< std::nextafterf(f, f*2) << "\n";
putchar('\n');
}
int main(void) {
std::cout << std::setprecision(24);
show(1);
show(1<<23);
show(1<<24);
show(1<<30);
}
produces this output:
0.999999940395355224609375
1
1.00000011920928955078125
8388607.5
8388608
8388609
16777215
16777216
16777218
1073741760
1073741824
1073741952
It shows the immediate predecessor and successor, in type float
, of the numbers 1, 223, 224, and 230. As you can see, the gaps get bigger for larger numbers, with the gap doubling in size at each power of 2.
You'd get similar results, but with smaller gaps, with type double
or long double
.
Upvotes: 13