Hugues
Hugues

Reputation: 3170

How to safely round-and-clamp from float64 to int64?

This question is about python/numpy, but it may apply to other languages as well.

How can the following code be improved to safely clamp large float values to the maximum int64 value during conversion? (Ideally, it should still be efficient.)

import numpy as np

def int64_from_clipped_float64(x, dtype=np.int64):
  x = np.round(x)
  x = np.clip(x, np.iinfo(dtype).min, np.iinfo(dtype).max)
  # The problem is that np.iinfo(dtype).max is imprecisely approximated as a
  # float64, and the approximation leads to overflow in the conversion.
  return x.astype(dtype)

for x in [-3.6, 0.4, 1.7, 1e18, 1e25]:
  x = np.array(x, dtype=np.float64)
  print(f'x = {x:<10}  result = {int64_from_clipped_float64(x)}')

# x = -3.6        result = -4
# x = 0.4         result = 0
# x = 1.7         result = 2
# x = 1e+18       result = 1000000000000000000
# x = 1e+25       result = -9223372036854775808

Upvotes: 3

Views: 651

Answers (2)

Hugues
Hugues

Reputation: 3170

Here is a generalization of the answer by orlp@ to safely clip-convert from arbitrary floats to arbitrary integers, and to support scalar values as input.

The function is also useful for the conversion of np.float32 to np.int32 because it avoids the creation of intermediate np.float64 values, as seen in the timing measurements.

def int_from_float(x, dtype=np.int64):
  x = np.asarray(x)
  assert issubclass(x.dtype.type, np.floating)
  input_is_scalar = x.ndim == 0
  x = np.atleast_1d(x)

  imin, imax = np.iinfo(dtype).min, np.iinfo(dtype).max
  fmin, fmax = x.dtype.type((imin, imax))
  too_small = x <= fmin
  too_large = x >= fmax
  ix = x.astype(dtype)
  ix[too_small] = imin
  ix[too_large] = imax
  return ix.item() if input_is_scalar else ix


print(int_from_float(np.float32(3e9), dtype=np.int32))  # 2147483647
print(int_from_float(np.float32(5e9), dtype=np.uint32))  # 4294967295
print(int_from_float(np.float64(1e25), dtype=np.int64))  # 9223372036854775807

a = np.linspace(0, 5e9, 1_000_000, dtype=np.float32).reshape(1000, 1000)

%timeit int_from_float(np.round(a), dtype=np.int32)
# 100 loops, best of 3: 3.74 ms per loop

%timeit np.clip(np.round(a), np.iinfo(np.int32).min, np.iinfo(np.int32).max).astype(np.int32)
# 100 loops, best of 3: 5.56 ms per loop

Upvotes: 0

orlp
orlp

Reputation: 117691

The problem is that the largest np.int64 is 263 - 1, which is not representable in floating point. The same issue doesn't happen on the other end, because -263 is exactly representable.

So do the clipping half in float space (for detection) and in integer space (for correction):

def int64_from_clipped_float64(x, dtype=np.int64):
    assert x.dtype == np.float64

    limits = np.iinfo(dtype)
    too_small = x <= np.float64(limits.min)
    too_large = x >= np.float64(limits.max)
    ix = x.astype(dtype)
    ix[too_small] = limits.min
    ix[too_large] = limits.max
    return ix

Upvotes: 1

Related Questions