Reputation: 8038
When converting a number from half to single floating representation I see a change in the numeric value.
Here I have 65500
stored as a half precision float, but upgrading to single precision changes the underlying value to 65504
, which is many floating point increments away from the target.
In this specific case, why does this happen?
(Pdb) np.asarray(65500,dtype=np.float16).astype(np.float32)
array(65504., dtype=float32)
As a side note, I also observe
(Pdb) int(np.finfo(np.float16).max)
65504
Upvotes: 1
Views: 3133
Reputation: 77910
The error is not "many floating point increments away" [corrected to match OP's improved wording]. Read the standard IEEE 754-2008. It specifies 10 bits for the mantissa, or 1024 distinct values. Your value is on the close order of 2^16, so you have an increment of 2^6, or 64.
The format also gives 1 bit for the sign and 5 for the characteristic (exponent).
65500 is stored as something equivalent to + 2^6 * 1023.5
. This translates directly to 65504
when you convert to float32. You lost the precision when you converted your larger number to 10 bits of precision. When you convert in either direction, the result is always constrained by the less-precise type.
Upvotes: 4