Reputation: 582
Is there a way to avoid using specific values when applying sum and mean in numpy?
I'd like to avoid, for instance, the -999 value when calculating the result.
In [14]: c = np.matrix([[4., 2.],[4., 1.]])
In [15]: d = np.matrix([[3., 2.],[4., -999.]])
In [16]: np.sum([c, d], axis=0)
Out[16]:
array([[ 7., 4.],
[ 8., -998.]])
In [17]: np.mean([c, d], axis=0)
Out[17]:
array([[ 3.5, 2. ],
[ 4. , -499. ]])
Upvotes: 8
Views: 14193
Reputation: 13040
In the latest version of numpy
, np.sum
and np.mean
have a where
parameter to specify which elements to include. This parameter is added for np.sum
in v1.17.0 and for np.mean
in v1.20.0.
For your example, you can set the parameter as where=(np.array([c, d]) > 0)
to only include positive elements:
>>> e = np.array([c, d])
>>> np.sum(e, axis=0, where=(e>0))
array([[7., 4.],
[8., 1.]])
Upvotes: 1
Reputation: 97641
Use a masked array:
>>> c = np.ma.array([[4., 2.], [4., 1.]])
>>> d = np.ma.masked_values([[3., 2.], [4., -999]], -999)
>>> np.ma.array([c, d]).sum(axis=0)
masked_array(data =
[[7.0 4.0]
[8.0 1.0]],
mask =
[[False False]
[False False]],
fill_value = 1e+20)
>>> np.ma.array([c, d]).mean(axis=0)
masked_array(data =
[[3.5 2.0]
[4.0 1.0]],
mask =
[[False False]
[False False]],
fill_value = 1e+20)
Upvotes: 10
Reputation: 215087
One option is to replace the specific value with np.nan
and then use numpy.nansum
and numpy.nanmean
as commented by @s.k:
import numpy as np
def nan_if(arr, value):
return np.where(arr == value, np.nan, arr)
np.nansum([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 7., 4.],
# [ 8., 1.]])
np.nanmean([nan_if(c, -999), nan_if(d, -999)], axis=0)
#array([[ 3.5, 2. ],
# [ 4. , 1. ]])
Upvotes: 9