Reputation: 582

How to ignore values when using numpy.sum and numpy.mean in matrices

Is there a way to avoid using specific values when applying sum and mean in numpy?

I'd like to avoid, for instance, the -999 value when calculating the result.

In [14]: c = np.matrix([[4., 2.],[4., 1.]])

In [15]: d = np.matrix([[3., 2.],[4., -999.]])

In [16]: np.sum([c, d], axis=0)
Out[16]:
array([[   7.,    4.],
       [   8., -998.]])

In [17]: np.mean([c, d], axis=0)
Out[17]:
array([[   3.5,    2. ],
       [   4. , -499. ]])

Upvotes: 8

Answers (3)

xskxzr

Reputation: 13040

In the latest version of numpy, np.sum and np.mean have a where parameter to specify which elements to include. This parameter is added for np.sum in v1.17.0 and for np.mean in v1.20.0.

For your example, you can set the parameter as where=(np.array([c, d]) > 0) to only include positive elements:

>>> e = np.array([c, d])
>>> np.sum(e, axis=0, where=(e>0))
array([[7., 4.],
       [8., 1.]])

Upvotes: 1

Eric

Reputation: 97641

Use a masked array:

>>> c = np.ma.array([[4., 2.], [4., 1.]])
>>> d = np.ma.masked_values([[3., 2.], [4., -999]], -999)

>>> np.ma.array([c, d]).sum(axis=0)
masked_array(data =
 [[7.0 4.0]
 [8.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.ma.array([c, d]).mean(axis=0)
masked_array(data =
 [[3.5 2.0]
 [4.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

Upvotes: 10

akuiper

Reputation: 215087

One option is to replace the specific value with np.nan and then use numpy.nansum and numpy.nanmean as commented by @s.k:

import numpy as np
def nan_if(arr, value):
    return np.where(arr == value, np.nan, arr)

np.nansum([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 7.,  4.],
#       [ 8.,  1.]])

np.nanmean([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 3.5,  2. ],
#       [ 4. ,  1. ]])

Upvotes: 9

How to ignore values when using numpy.sum and numpy.mean in matrices

Answers (3)

Related Questions