Reputation: 3170
This question is about python/numpy, but it may apply to other languages as well.
How can the following code be improved to safely clamp large float values to the maximum int64 value during conversion? (Ideally, it should still be efficient.)
import numpy as np
def int64_from_clipped_float64(x, dtype=np.int64):
x = np.round(x)
x = np.clip(x, np.iinfo(dtype).min, np.iinfo(dtype).max)
# The problem is that np.iinfo(dtype).max is imprecisely approximated as a
# float64, and the approximation leads to overflow in the conversion.
return x.astype(dtype)
for x in [-3.6, 0.4, 1.7, 1e18, 1e25]:
x = np.array(x, dtype=np.float64)
print(f'x = {x:<10} result = {int64_from_clipped_float64(x)}')
# x = -3.6 result = -4
# x = 0.4 result = 0
# x = 1.7 result = 2
# x = 1e+18 result = 1000000000000000000
# x = 1e+25 result = -9223372036854775808
Upvotes: 3
Views: 651
Reputation: 3170
Here is a generalization of the answer by orlp@ to safely clip-convert from arbitrary floats to arbitrary integers, and to support scalar values as input.
The function is also useful for the conversion of np.float32
to np.int32
because it avoids the creation of intermediate np.float64
values,
as seen in the timing measurements.
def int_from_float(x, dtype=np.int64):
x = np.asarray(x)
assert issubclass(x.dtype.type, np.floating)
input_is_scalar = x.ndim == 0
x = np.atleast_1d(x)
imin, imax = np.iinfo(dtype).min, np.iinfo(dtype).max
fmin, fmax = x.dtype.type((imin, imax))
too_small = x <= fmin
too_large = x >= fmax
ix = x.astype(dtype)
ix[too_small] = imin
ix[too_large] = imax
return ix.item() if input_is_scalar else ix
print(int_from_float(np.float32(3e9), dtype=np.int32)) # 2147483647
print(int_from_float(np.float32(5e9), dtype=np.uint32)) # 4294967295
print(int_from_float(np.float64(1e25), dtype=np.int64)) # 9223372036854775807
a = np.linspace(0, 5e9, 1_000_000, dtype=np.float32).reshape(1000, 1000)
%timeit int_from_float(np.round(a), dtype=np.int32)
# 100 loops, best of 3: 3.74 ms per loop
%timeit np.clip(np.round(a), np.iinfo(np.int32).min, np.iinfo(np.int32).max).astype(np.int32)
# 100 loops, best of 3: 5.56 ms per loop
Upvotes: 0
Reputation: 117691
The problem is that the largest np.int64
is 263 - 1, which is not representable in floating point. The same issue doesn't happen on the other end, because -263 is exactly representable.
So do the clipping half in float space (for detection) and in integer space (for correction):
def int64_from_clipped_float64(x, dtype=np.int64):
assert x.dtype == np.float64
limits = np.iinfo(dtype)
too_small = x <= np.float64(limits.min)
too_large = x >= np.float64(limits.max)
ix = x.astype(dtype)
ix[too_small] = limits.min
ix[too_large] = limits.max
return ix
Upvotes: 1