tschm
tschm

Reputation: 2955

Numpy only on finite entries

Here's a brief example of a function. It maps a vector to a vector. However, entries that are NaN or inf should be ignored. Currently this looks rather clumsy to me. Do you have any suggestions?

from scipy import stats
import numpy as np

def p(vv):
    mask = np.isfinite(vv)
    y = np.NaN * vv
    v = vv[mask]

    y[mask] = 1/v*(stats.hmean(v)/len(v))
    return y

Upvotes: 0

Views: 1737

Answers (3)

prijatelj
prijatelj

Reputation: 885

Masked arrays accomplish this functionality and allow you to specify the mask as you desire. The numpy 1.18 docs for it are here: https://numpy.org/doc/1.18/reference/maskedarray.generic.html#what-is-a-masked-array

In masked arrays, False mask values are used in calculations, while True are ignored for calculations.

Example for obtaining the mean of only the finite values using np.isfinite():

import numpy as np

# Seeding for reproducing these results
np.random.seed(0)

# Generate random data and add some non-finite values
x = np.random.randint(0, 5, (3, 3)).astype(np.float32)
x[1,2], x[2,1], x[2,2] = np.inf, -np.inf, np.nan
# array([[  4.,   0.,   3.],
#        [  3.,   3.,  inf],
#        [  3., -inf,  nan]], dtype=float32)

# Make masked array. Note the logical not of isfinite
x_masked = np.ma.masked_array(x, mask=~np.isfinite(x))

# Mean of entire masked matrix
x_masked.mean()
# 2.6666666666666665

# Masked matrix's row means
x_masked.mean(1)
# masked_array(data=[2.3333333333333335, 3.0, 3.0],
#              mask=[False, False, False],
#        fill_value=1e+20)

# Masked matrix's column means
x_masked.mean(0)
# masked_array(data=[3.3333333333333335, 1.5, 3.0],
#              mask=[False, False, False],
#        fill_value=1e+20)

Note that scipy.stats.hmean() also works with masked arrays.

Note that if all you care about is detecting NaNs and leaving infs, then you can use np.isnan() instead of np.isfinite().

Upvotes: 1

tschm
tschm

Reputation: 2955

I have came up with this kind of construction:

from scipy import stats
import numpy as np


## operate only on the valid entries of x and use the same mask on the resulting vector y
def __f(func, x):
    mask = np.isfinite(x)
    y = np.NaN * x
    y[mask] = func(x[mask])
    return y


# implementation of the parity function
def __pp(x):
    return 1/x*(stats.hmean(x)/len(x))


def pp(vv):
    return __f(__pp, vv)

Upvotes: 1

user1749431
user1749431

Reputation: 569

You can change the NaN values to zero with Numpy's isnan function and then remove the zeros as follows:

import numpy as np

def p(vv):
    # assuming vv is your array
    # use Nympy's isnan function to replace the NaN values in the array with zero

     replace_NaN = np.isnan(vv)
     vv[replace_NaN] = 0

     # convert array vv to list
     vv_list = vv.tolist()
     new_list = []

     # loop vv_list and exclude 0 values:
      for i in vv_list:
          if i != 0:
              new.list.append(i)

      # set array vv again

      vv = np.array(new_list, dtype = 'float64')

      return vv

Upvotes: 1

Related Questions