R_Moose
R_Moose

Reputation: 105

Conditional mean in numpy arrays?

I have a numpy array named "distances" which looks like this:

[[ 5.  1.  1.  1.  2.  1.  3.  1.  1.  1.]
[ 5.  4.  4.  5.  7. 10.  3.  2.  1.  1.]
 [ 3.  1.  1.  1.  2.  2.  3.  1.  1.  0.]
 [ 6.  8.  8.  1.  3.  4.  3.  7.  1.  1.]
 [ 4.  1.  1.  3.  2.  1.  3.  1.  1.  1.]
 [ 8. 10. 10.  8.  7. 10.  9.  7.  1.  1.]
 [ 1.  1.  1.  1.  2. 10.  3.  1.  1.  0.]
 [ 2.  1.  2.  1.  2.  1.  3.  1.  1.  0.]
 [ 2.  1.  1.  1.  2.  1.  1.  1.  5.  2.]
 [ 4.  2.  1.  1.  2.  1.  2.  1.  1.  1.]]

I want to make a new 3*9 numpy array by taking mean like this:

  1. If last column is 0, define an array c0 (1*9) which is mean of all such rows where last column is 0 where each column is mean of the columns from such rows.
  2. If last column is 1, define an array c1 (1*9) which is mean of all such rows where last column is 1 where each column is mean of the columns from such rows.
  3. If last column is 2, define an array c2 (1*9) which is mean of all such rows where last column is 2 where each column is mean of the columns from such rows.

Post doing this I am doing hstack to get final 3*9 array. I am sure this is the long approach but none the less wrong.

code:

c0=distances.mean(axis=1)

final = np.hstack((c0,c1,c2))

Doing this I get 1*10 array where each column is average of each column from distances array, however I am unable to find a way to do so on a condition that only take average when last column of rows is 0 only ?

Upvotes: 0

Views: 948

Answers (2)

Divakar
Divakar

Reputation: 221534

With pandas

Would be straight-forward with pandas -

import pandas as pd

df = pd.DataFrame(distances)
df_out = df.groupby(df.shape[1]-1).mean()
df_out['ID'] = df_out.index
out = df_out.values

With NumPy

Using Custom-function

For a NumPy-specific one, we can use groupbycol (perform group-based summations) and hence solve our case, like so -

sums  = groupbycol(distances, assume_sorted_col=False, colID=-1)
out = sums/np.bincount(distances[:,-1]).astype(float)[:,None]

With matrix-multiplication

mask = distances[:,-1,None] == np.arange(distances[:,-1].max()+1)
out = mask.T.dot(distances)/mask.sum(0)[:,None].astype(float)

Upvotes: 1

R_Moose
R_Moose

Reputation: 105

I was able to do it like this:

c0= (distances[distances[:,-1] == 0][:,0:9]).mean(axis=0)
c1 = (distances[distances[:,-1] == 1][:,0:9]).mean(axis=0)
c2 = (distances[distances[:,-1] == 2][:,0:9]).mean(axis=0)

Upvotes: 0

Related Questions