Reputation: 628

Python numpy averaging

Averaging a table like this is not a problem

table = [[1,2,3,0],[1,2,3,0],[1,2,3,4]]

You can

print numpy.average(table,axis=0)

But what if I have uneven sequences like:

table = [[1,2,3],[1,2,3],[1,2,3,4]]

Then the result should be:

1,2,3,4

As the element containing number 4 only occurs once. and 4/1 = 4. But numpy will not allow this.

ValueError: setting an array element with a sequence.

Upvotes: 2

Answers (2)

unutbu

Reputation: 880677

You could feed the data into a numpy masked array, then compute the means with np.ma.mean:

import numpy as np
import itertools
data=[[1,2,3],[1,2,3],[1,2,3,4]]

rows=len(data)
cols=max(len(row) for row in data)
arr=np.ma.zeros((rows,cols))
arr.mask=True
for i,row in enumerate(data):
    arr[i,:len(row)]=row

print(arr.mean(axis=0))

yields

[1.0 2.0 3.0 4.0]

Elements of the array get unmasked (i.e. arr.mask[i,j]=False) when a value is assigned. Note the resultant mask below:

In [162]: arr
Out[162]: 
masked_array(data =
 [[1.0 2.0 3.0 --]
 [1.0 2.0 3.0 --]
 [1.0 2.0 3.0 4.0]],
             mask =
 [[False False False  True]
 [False False False  True]
 [False False False False]],
       fill_value = 1e+20)

If your data is rather short, yosukesabai's method or a pure Python solution is likely to be faster than what I show above. Only invest in making a masked array if the data is very large and you have enough numpy operations to perform on the array to make the initial cost of setting up the array worth it.

Upvotes: 3

yosukesabai

Reputation: 6244

The only workaround i can think of is to use itertools and temporary list, not very beautiful.

import numpy as np
from itertools import izip_longest
table = [[1,2,3],[1,2,3],[1,2,3,4]]

for row in izip_longest(*table):
    print np.average([x for x in row if x is not None])

This yields

1.0
2.0
3.0
4.0

Upvotes: 2

Python numpy averaging

Answers (2)

Related Questions