Reputation: 28531
I have two arrays, and I'd like to take per-cell average of them, but taking into account NaNs.
My two arrays are:
In [267]: a = np.array([ [1, 2, np.nan], [np.nan, 5, 6], [np.nan, np.nan, np.nan]])
In [268]: a
Out[268]:
array([[ 1., 2., nan],
[ nan, 5., 6.],
[ nan, nan, nan]])
In [269]: b = np.array( [ [2, np.nan, 6], [8, np.nan, 12], [14, 16, np.nan]])
In [270]: b
Out[270]:
array([[ 2., nan, 6.],
[ 8., nan, 12.],
[ 14., 16., nan]])
If I didn't want to take into account NaNs then I could do:
In [271]: (a+b)/2
Out[271]:
array([[ 1.5, nan, nan],
[ nan, nan, 9. ],
[ nan, nan, nan]])
However, I need to do the mean calculation so that mean(2.5, nan) == 2.5
- and thus NaNs are ignored, unless I have two NaNs in which case mean(nan, nan) == nan
.
Thus, the result I'd like to get is:
Out[271]:
array([[ 1.5, 2, 6],
[ 8, 5, 9. ],
[ 14, 16, nan]])
The scipy.stats.nanmean
seems to do this. However, to do this, I think I need to get the arrays stacked properly. I have two 3 x 3 arrays, and I think I need to create a 2 x 3 x 3 array - is that right? I can't seem to manage to stack these arrays to create a result with those dimensions - I've tried np.dstack
as well as various other techniques, but nothing seems to work.
I suspect I'm doing something silly - any ideas as to how I can fix this?
Upvotes: 2
Views: 711
Reputation: 31050
I combined the arrays using np.array:
>>> c=np.array([a,b])
array([[[ 1., 2., nan],
[ nan, 5., 6.],
[ nan, nan, nan]],
[[ 2., nan, 6.],
[ 8., nan, 12.],
[ 14., 16., nan]]])
>>> scipy.stats.nanmean(c,axis=0)
array([[ 1.5, 2. , 6. ],
[ 8. , 5. , 9. ],
[ 14. , 16. , nan]])
Upvotes: 2
Reputation: 6376
You need to concatenate the arrays across a new axis (the third dimension - axis 2). You can then take the nanmean
over this dimension.
In [1]: c = np.concatenate([a[..., None], b[..., None]], axis=2)
In [2]: scipy.stats.nanmean(c, axis=2)
Out[3]:
array([[ 1.5, 2. , 6. ],
[ 8. , 5. , 9. ],
[ 14. , 16. , nan]])
Upvotes: 2