siamii
siamii

Reputation: 24104

calculate mean across 3 dimensional array of different length

I have a 3 dimensional array with dimensions n * d * m. However, ni can vary. It looks something like this

[[[1,1,3],  [[3,2,1],  [[4,3,2],  
  [3,4,2]],  [3,4,2],   [5,2,3]]]
             [4,5,3]],

I need to calculate the mean across all data points. I was using the scipy.stats.mean function and but threw error about mismatching dimensions. Therefore, I was thinking about padding the array to the largest ni so that it has uniform dimensions something like this

[[[  1,  1,  3],  [[3,2,1],  [[  4,  3,  2],  
  [  3,  4,  2],   [3,4,2],   [  5,  2,  3],
  [NaN,NaN,NaN]],  [4,5,3]],  [NaN,NaN,NaN]]] 

but I don't know if this is the best solution or how I could calculate the mean with NaN.

Any suggestions?

Upvotes: 3

Views: 4405

Answers (3)

Jan
Jan

Reputation: 5162

It looks like all numpy array functions fail because your object is a list of 2D arrays rather than a 3D numpy array.

A solution is to unroll the list and compute the means separately. In your case this will be:

In [23]: import numpy as np

In [24]: L = np.array([[[1,1,3],[3,4,2]],
                  [[3,2,1],[3,4,2],[4,5,3]],[[4,3,2],[5,2,3]]])

In [28]: Lsum = ( np.sum(L[0]) + np.sum(L[1]) + np.sum(L[2]) ) 

In [29]: Lmean = Lsum.astype(float) \ 
                   / ( np.size(L[0]) + np.size(L[1]) + np.size(L[2]) )

In [46]: Lmean
Out[46]: 2.8571428571428572

This can be put into a loop for varying 3rd dimension...

Upvotes: 0

Fred Foo
Fred Foo

Reputation: 363597

You could use masked arrays:

>>> from numpy import ma, nan
>>> a = ma.array([[1,1,3], [3,4,2], [nan,nan,nan]], mask=[[0,0,0],[0,0,0],[1,1,1]])
>>> b = ma.array([[3,2,1], [3,4,2], [4,5,3]])
>>> c = ma.array([[4,3,2], [5,2,3], [nan,nan,nan]], mask=[[0,0,0],[0,0,0],[1,1,1]])
>>> X = ma.array([a, b, c])

Then taking the mean over any axis will ignore the masked values:

>>> X.mean(axis=0)
masked_array(data =
 [[2.66666666667 2.0 2.0]
 [3.66666666667 3.33333333333 2.33333333333]
 [4.0 5.0 3.0]],
             mask =
 [[False False False]
 [False False False]
 [False False False]],
       fill_value = 1e+20)

>>> X.mean(axis=1)
masked_array(data =
 [[2.0 2.5 2.5]
 [3.33333333333 3.66666666667 2.0]
 [4.5 2.5 2.5]],
             mask =
 [[False False False]
 [False False False]
 [False False False]],
       fill_value = 1e+20)

>>> X.mean(axis=2)
masked_array(data =
 [[1.66666666667 3.0 --]
 [2.0 3.0 4.0]
 [3.0 3.33333333333 --]],
             mask =
 [[False False  True]
 [False False False]
 [False False  True]],
       fill_value = 1e+20)

Upvotes: 2

asheeshr
asheeshr

Reputation: 4114

Assuming your array is actually [[[1,1,3],[3,4,2]],[[3,2,1],[3,4,2],[4,5,3]],[[4,3,2],[5,2,3]]] which it should be as you mention you have a 3-dimensional array, then the average can be found using loops :

>>> l = [[[1,1,3],[3,4,2]],[[3,2,1],[3,4,2],[4,5,3]],[[4,3,2],[5,2,3]]]
>>> s = 0; n=0;

>>> for i in l:    #First loop traverses through the first dimension
        for j in i:    #Traverses through the second dimension
            s += sum(j)
            n += len(j)


>>> print("Average is ", s/n)
Average is  2.85714

Upvotes: 0

Related Questions