Why can the maximum value of a numpy array not be expressed in that dtype?

Question

I am converting a NumPy array from a float dtype to an integer dtype. In the process, I want to cast values above the maximum value allowable by the dtype to that maximum. But for some reason that fails, and the conversion returns the minimum value. Here is code to reproduce (Python3, Numpy 1.22.2), with just numpy.inf as an example

float_array = numpy.array([[1, +numpy.inf], [2,2]])
dtype = numpy.dtype(numpy.int64)
cut_array = numpy.nan_to_num(float_array, posinf=numpy.iinfo(dtype).max)
int_array = cut_array.astype(dtype)

This returns int_array[0,1] equals -9223372036854775808. Why is the maximum value (about 9.2e+18) representable actually not usable for dtype int64?

I tested a bit, a slightly smaller value than the max will work, e.g. using posinf=numpy.iinfo(dtype).max - 600 will lead to a good conversion.

ouistiti · Accepted Answer

From the comments by Warren Weckesser and Tim Roberts: since a double only has 53 bits of precision, it can not represent exactly int64 e.g. int(float(9223372036854775807)) = 9223372036854775808 In this example, the int conversion has rounded the original int value approximated by float, which essentially added +1 to the int, making it overflow.

Why can the maximum value of a numpy array not be expressed in that dtype?

Answers (1)

Related Questions