Reputation: 628
Averaging a table like this is not a problem
table = [[1,2,3,0],[1,2,3,0],[1,2,3,4]]
You can
print numpy.average(table,axis=0)
But what if I have uneven sequences like:
table = [[1,2,3],[1,2,3],[1,2,3,4]]
Then the result should be:
1,2,3,4
As the element containing number 4 only occurs once. and 4/1 = 4. But numpy will not allow this.
ValueError: setting an array element with a sequence.
Upvotes: 2
Views: 665
Reputation: 879093
You could feed the data into a numpy masked array, then compute the means with np.ma.mean
:
import numpy as np
import itertools
data=[[1,2,3],[1,2,3],[1,2,3,4]]
rows=len(data)
cols=max(len(row) for row in data)
arr=np.ma.zeros((rows,cols))
arr.mask=True
for i,row in enumerate(data):
arr[i,:len(row)]=row
print(arr.mean(axis=0))
yields
[1.0 2.0 3.0 4.0]
Elements of the array get unmasked (i.e. arr.mask[i,j]=False
) when a value is assigned. Note the resultant mask below:
In [162]: arr
Out[162]:
masked_array(data =
[[1.0 2.0 3.0 --]
[1.0 2.0 3.0 --]
[1.0 2.0 3.0 4.0]],
mask =
[[False False False True]
[False False False True]
[False False False False]],
fill_value = 1e+20)
If your data is rather short, yosukesabai's method or a pure Python solution is likely to be faster than what I show above. Only invest in making a masked array if the data is very large and you have enough numpy operations to perform on the array to make the initial cost of setting up the array worth it.
Upvotes: 3
Reputation: 6244
The only workaround i can think of is to use itertools and temporary list, not very beautiful.
import numpy as np
from itertools import izip_longest
table = [[1,2,3],[1,2,3],[1,2,3,4]]
for row in izip_longest(*table):
print np.average([x for x in row if x is not None])
This yields
1.0
2.0
3.0
4.0
Upvotes: 2