Reputation: 24104
I have a 3 dimensional array with dimensions n * d * m. However, ni can vary. It looks something like this
[[[1,1,3], [[3,2,1], [[4,3,2],
[3,4,2]], [3,4,2], [5,2,3]]]
[4,5,3]],
I need to calculate the mean across all data points. I was using the scipy.stats.mean
function and but threw error about mismatching dimensions. Therefore, I was thinking about padding the array to the largest ni so that it has uniform dimensions something like this
[[[ 1, 1, 3], [[3,2,1], [[ 4, 3, 2],
[ 3, 4, 2], [3,4,2], [ 5, 2, 3],
[NaN,NaN,NaN]], [4,5,3]], [NaN,NaN,NaN]]]
but I don't know if this is the best solution or how I could calculate the mean with NaN.
Any suggestions?
Upvotes: 3
Views: 4405
Reputation: 5162
It looks like all numpy array functions fail because your object is a list of 2D arrays rather than a 3D numpy array.
A solution is to unroll the list and compute the means separately. In your case this will be:
In [23]: import numpy as np
In [24]: L = np.array([[[1,1,3],[3,4,2]],
[[3,2,1],[3,4,2],[4,5,3]],[[4,3,2],[5,2,3]]])
In [28]: Lsum = ( np.sum(L[0]) + np.sum(L[1]) + np.sum(L[2]) )
In [29]: Lmean = Lsum.astype(float) \
/ ( np.size(L[0]) + np.size(L[1]) + np.size(L[2]) )
In [46]: Lmean
Out[46]: 2.8571428571428572
This can be put into a loop for varying 3rd dimension...
Upvotes: 0
Reputation: 363597
You could use masked arrays:
>>> from numpy import ma, nan
>>> a = ma.array([[1,1,3], [3,4,2], [nan,nan,nan]], mask=[[0,0,0],[0,0,0],[1,1,1]])
>>> b = ma.array([[3,2,1], [3,4,2], [4,5,3]])
>>> c = ma.array([[4,3,2], [5,2,3], [nan,nan,nan]], mask=[[0,0,0],[0,0,0],[1,1,1]])
>>> X = ma.array([a, b, c])
Then taking the mean over any axis will ignore the masked values:
>>> X.mean(axis=0)
masked_array(data =
[[2.66666666667 2.0 2.0]
[3.66666666667 3.33333333333 2.33333333333]
[4.0 5.0 3.0]],
mask =
[[False False False]
[False False False]
[False False False]],
fill_value = 1e+20)
>>> X.mean(axis=1)
masked_array(data =
[[2.0 2.5 2.5]
[3.33333333333 3.66666666667 2.0]
[4.5 2.5 2.5]],
mask =
[[False False False]
[False False False]
[False False False]],
fill_value = 1e+20)
>>> X.mean(axis=2)
masked_array(data =
[[1.66666666667 3.0 --]
[2.0 3.0 4.0]
[3.0 3.33333333333 --]],
mask =
[[False False True]
[False False False]
[False False True]],
fill_value = 1e+20)
Upvotes: 2
Reputation: 4114
Assuming your array is actually [[[1,1,3],[3,4,2]],[[3,2,1],[3,4,2],[4,5,3]],[[4,3,2],[5,2,3]]]
which it should be as you mention you have a 3-dimensional array, then the average can be found using loops :
>>> l = [[[1,1,3],[3,4,2]],[[3,2,1],[3,4,2],[4,5,3]],[[4,3,2],[5,2,3]]]
>>> s = 0; n=0;
>>> for i in l: #First loop traverses through the first dimension
for j in i: #Traverses through the second dimension
s += sum(j)
n += len(j)
>>> print("Average is ", s/n)
Average is 2.85714
Upvotes: 0