Etore Marcari Jr.
Etore Marcari Jr.

Reputation: 582

How to ignore values when using numpy.sum and numpy.mean in matrices

Is there a way to avoid using specific values when applying sum and mean in numpy?

I'd like to avoid, for instance, the -999 value when calculating the result.

In [14]: c = np.matrix([[4., 2.],[4., 1.]])

In [15]: d = np.matrix([[3., 2.],[4., -999.]])

In [16]: np.sum([c, d], axis=0)
Out[16]:
array([[   7.,    4.],
       [   8., -998.]])

In [17]: np.mean([c, d], axis=0)
Out[17]:
array([[   3.5,    2. ],
       [   4. , -499. ]])

Upvotes: 8

Views: 14193

Answers (3)

xskxzr
xskxzr

Reputation: 13040

In the latest version of numpy, np.sum and np.mean have a where parameter to specify which elements to include. This parameter is added for np.sum in v1.17.0 and for np.mean in v1.20.0.

For your example, you can set the parameter as where=(np.array([c, d]) > 0) to only include positive elements:

>>> e = np.array([c, d])
>>> np.sum(e, axis=0, where=(e>0))
array([[7., 4.],
       [8., 1.]])

Upvotes: 1

Eric
Eric

Reputation: 97641

Use a masked array:

>>> c = np.ma.array([[4., 2.], [4., 1.]])
>>> d = np.ma.masked_values([[3., 2.], [4., -999]], -999)

>>> np.ma.array([c, d]).sum(axis=0)
masked_array(data =
 [[7.0 4.0]
 [8.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

>>> np.ma.array([c, d]).mean(axis=0)
masked_array(data =
 [[3.5 2.0]
 [4.0 1.0]],
             mask =
 [[False False]
 [False False]],
       fill_value = 1e+20)

Upvotes: 10

akuiper
akuiper

Reputation: 215087

One option is to replace the specific value with np.nan and then use numpy.nansum and numpy.nanmean as commented by @s.k:

import numpy as np
def nan_if(arr, value):
    return np.where(arr == value, np.nan, arr)

np.nansum([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 7.,  4.],
#       [ 8.,  1.]])

np.nanmean([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 3.5,  2. ],
#       [ 4. ,  1. ]])

Upvotes: 9

Related Questions