avoid column slicing array for calculations

Question

import numpy as np
Xs = np.array([[1,3,3,4,5,7], [2,4,5,1,1,6], [5,5,6,4,3,2]]).T
groupIDs = np.array([10,10,20,20,30,30])
p = np.array([0.5, 0.5, 0.25, 0.75, 1, 0])
_,idx,tags = np.unique(groupIDs, return_index=1, return_inverse=1)
print(Xs)
[[1 2 5]
 [3 4 5]
 [3 5 6]
 [4 1 4]
 [5 1 3]
 [7 6 2]]

I am trying to create a new table with the sum of products between p and X per group, for each column. The only way I can think to make this work is

new = np.empty((6,3))
for i in range(3):
    new[:,i] = np.add.reduceat((p * Xs[:,i]),idx)[tags] 
print(new)
[[ 2.    3.    5.  ]
 [ 2.    3.    5.  ]
 [ 3.75  2.    4.5 ]
 [ 3.75  2.    4.5 ]
 [ 5.    1.    3.  ]
 [ 5.    1.    3.  ]]

I am straggling to tune my mind into thinking 'vector-wise' to make it work (hopefully) faster for my large dataset consisting of thousands of xs by avoiding the loop. Any suggestions please.

Nuageux · Accepted Answer

import numpy as np
Xs = np.array([[1,3,3,4,5,7], [2,4,5,1,1,6], [5,5,6,4,3,2]])
groupIDs = np.array([10,10,20,20,30,30])
p = np.array([0.5, 0.5, 0.25, 0.75, 1, 0])
_,idx,tags = np.unique(groupIDs, return_index=1, return_inverse=1)

print np.add.reduceat((p*Xs).T, idx)[tags]

No need to use a for. It is sufficient to transpose some matrix, check the last line.

I remove the transpose in the declaration of Xs. But if you really need it, you'll have to add one in the last line ((p*Xs.T).T)

avoid column slicing array for calculations

Answers (2)

Related Questions