Reputation: 7243
I'm really curious about how Double Precision Floating point number is stored.
These are things I figured out so far.
However I do not uderstand what is exponent, exponent bias and all those formulas in wikipedia page.
Can anyone explain me what are all those things, how they work and eventually calculated to the real number step by step?
Upvotes: 3
Views: 3698
Reputation: 11
int main()
{
double num = 5643.0662;
int sign = 0;
int exponent = 1035;
int exponent_bias = 1023;
float mantissa = 0.0662;
double x = pow(-1,sign) * pow(2,(exponent - exponent_bias)) * (1+mantissa);
int y = num - x;
cout << "\nValue of x is : " << x << endl;
cout << "\nValue of y is : " << y << endl;
return 0;
}
Upvotes: 0
Reputation: 785
e
such that fraction * 2^e
is equal to the number that i want to rappresent.an example (in single precision couse is more comfortable for me to write =)):
if i had to rappresent -0.75 i do:
- binary rappresentation will be -11 * 2^-2 = -1.1 * 2^-1
1
126 -> 01111110
so we had -0.75 = 1 01111110 10000000000000000000000
For the sum you have to align the exponent and then you can sum the fracional part.
For multiplication you have to
Upvotes: 1
Reputation: 19965
Check out the formula a little further down the page:
Except for the above exceptions, the entire double-precision number is described by:
(-1)^sign * 2^(exponent - bias) * 1.mantissa
The formula means that for non-NAN, non-INF, non-zero and non-denormal numbers (which I'll ignore) you take the bits in the mantissa and add an implicit 1 bit at the top. This makes the mantissa 53 bits in the range 1.0 ... 1.111111...11 (binary). To get the actual value, you multiply the mantissa by the 2 to the power of the exponent minus the bias (1023) and either negate the result or not depending on the sign bit. The number 1.0 would have an unbiased exponent of zero (i.e. 1.0 = 1.0 * 2^0) and its biased exponent would be 1023 (the bias is just added to the exponent). So, 1.0 would be sign = 1, exponent = 1023, mantissa = 0 (remember the hidden mantissa bit).
Putting it all together in hexadecimal the value would be 0x3FF000000000 == 1.0.
Upvotes: 2