Reputation: 395
I have a numpy array like the following:
x = array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ nan, 8., 9.]])
and I want to calculate the mean of each column. If I use np.mean(x, axis=0)
, then I get nan
as the mean of the first column, and using x[~np.isnan(x)]
to filter out nan
values flattens the array into a 1D array.
I'm required to use an older version of numpy, so I can't use numpy.nanmean
Edit: This comment explains why this isn't a duplicate of the question posted
Upvotes: 1
Views: 2849
Reputation: 395
I figured out another approach that doesn't use boolean indexing:
means = []
# Iterate over each column in x
for col in x.T:
filtered_vals = col[~np.isnan(col)]
avg = np.mean(filtered_vals)
means.append(avg)
One line version:
means = [np.mean(col[~np.isnan(col)]) for col in x.T]
Upvotes: 0
Reputation: 221614
One approach would be using boolean-indexing
-
def nanmean_cols(x):
mask = ~np.isnan(x)
x_masked = np.where(mask, x, 0)
return x_masked.sum(0)/mask.sum(0)
Sample run -
In [114]: x
Out[114]:
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ nan, 8., 9.]])
In [115]: np.nanmean(x,axis=0)
Out[115]: array([ 2.5, 5. , 6. ])
In [117]: nanmean_cols(x)
Out[117]: array([ 2.5, 5. , 6. ])
Upvotes: 2