Mikhail
Mikhail

Reputation: 8038

Why does converting from np.float16 to np.float32 modify the value?

When converting a number from half to single floating representation I see a change in the numeric value.

Here I have 65500 stored as a half precision float, but upgrading to single precision changes the underlying value to 65504, which is many floating point increments away from the target.

In this specific case, why does this happen?

(Pdb) np.asarray(65500,dtype=np.float16).astype(np.float32)
array(65504., dtype=float32)

As a side note, I also observe

(Pdb) int(np.finfo(np.float16).max)
65504

Upvotes: 1

Views: 3133

Answers (1)

Prune
Prune

Reputation: 77910

The error is not "many floating point increments away" [corrected to match OP's improved wording]. Read the standard IEEE 754-2008. It specifies 10 bits for the mantissa, or 1024 distinct values. Your value is on the close order of 2^16, so you have an increment of 2^6, or 64.

The format also gives 1 bit for the sign and 5 for the characteristic (exponent).

65500 is stored as something equivalent to + 2^6 * 1023.5. This translates directly to 65504 when you convert to float32. You lost the precision when you converted your larger number to 10 bits of precision. When you convert in either direction, the result is always constrained by the less-precise type.

Upvotes: 4

Related Questions