Reputation: 105
I have a numpy array named "distances" which looks like this:
[[ 5. 1. 1. 1. 2. 1. 3. 1. 1. 1.]
[ 5. 4. 4. 5. 7. 10. 3. 2. 1. 1.]
[ 3. 1. 1. 1. 2. 2. 3. 1. 1. 0.]
[ 6. 8. 8. 1. 3. 4. 3. 7. 1. 1.]
[ 4. 1. 1. 3. 2. 1. 3. 1. 1. 1.]
[ 8. 10. 10. 8. 7. 10. 9. 7. 1. 1.]
[ 1. 1. 1. 1. 2. 10. 3. 1. 1. 0.]
[ 2. 1. 2. 1. 2. 1. 3. 1. 1. 0.]
[ 2. 1. 1. 1. 2. 1. 1. 1. 5. 2.]
[ 4. 2. 1. 1. 2. 1. 2. 1. 1. 1.]]
I want to make a new 3*9 numpy array by taking mean like this:
Post doing this I am doing hstack to get final 3*9 array. I am sure this is the long approach but none the less wrong.
code:
c0=distances.mean(axis=1)
final = np.hstack((c0,c1,c2))
Doing this I get 1*10 array where each column is average of each column from distances array, however I am unable to find a way to do so on a condition that only take average when last column of rows is 0 only ?
Upvotes: 0
Views: 948
Reputation: 221534
pandas
Would be straight-forward with pandas
-
import pandas as pd
df = pd.DataFrame(distances)
df_out = df.groupby(df.shape[1]-1).mean()
df_out['ID'] = df_out.index
out = df_out.values
NumPy
Using Custom-function
For a NumPy-specific one, we can use groupbycol
(perform group-based summations) and hence solve our case, like so -
sums = groupbycol(distances, assume_sorted_col=False, colID=-1)
out = sums/np.bincount(distances[:,-1]).astype(float)[:,None]
With matrix-multiplication
mask = distances[:,-1,None] == np.arange(distances[:,-1].max()+1)
out = mask.T.dot(distances)/mask.sum(0)[:,None].astype(float)
Upvotes: 1
Reputation: 105
I was able to do it like this:
c0= (distances[distances[:,-1] == 0][:,0:9]).mean(axis=0)
c1 = (distances[distances[:,-1] == 1][:,0:9]).mean(axis=0)
c2 = (distances[distances[:,-1] == 2][:,0:9]).mean(axis=0)
Upvotes: 0