numpy astype from float32 to float16

Question

I would like to know how numpy casts from float32 to float16, because when I cast some number like 8193 from float32 to float16 using astype, it will output 8192 while 10000 of float32 casted into 10000 of float16.

import numpy as np
a = np.array([8193], dtype=np.float32)
b = a.astype(np.float16)

PM 2Ring · Accepted Answer

The IEEE 754-2008 16-bit base 2 format, aka binary16, doesn't give you a lot of precision. What do you expect from 16 bits? :) 1 bit is the sign bit, 5 bits are used for the exponent, and that leaves 10 bits to store the normalised 11 bit mantissa, so anything > 2**11 == 2048 has to be quantized.

According to Wikipedia, integers between 4097 and 8192 round to a multiple of 4, and integers between 8193 and 16384 round to a multiple of 8.

numpy astype from float32 to float16

Answers (2)

Related Questions