Reputation: 793
import numpy as np
Xs = np.array([[1,3,3,4,5,7], [2,4,5,1,1,6], [5,5,6,4,3,2]]).T
groupIDs = np.array([10,10,20,20,30,30])
p = np.array([0.5, 0.5, 0.25, 0.75, 1, 0])
_,idx,tags = np.unique(groupIDs, return_index=1, return_inverse=1)
print(Xs)
[[1 2 5]
[3 4 5]
[3 5 6]
[4 1 4]
[5 1 3]
[7 6 2]]
I am trying to create a new table with the sum of products between p
and X
per group, for each column. The only way I can think to make this work is
new = np.empty((6,3))
for i in range(3):
new[:,i] = np.add.reduceat((p * Xs[:,i]),idx)[tags]
print(new)
[[ 2. 3. 5. ]
[ 2. 3. 5. ]
[ 3.75 2. 4.5 ]
[ 3.75 2. 4.5 ]
[ 5. 1. 3. ]
[ 5. 1. 3. ]]
I am straggling to tune my mind into thinking 'vector-wise' to make it work (hopefully) faster for my large dataset consisting of thousands of xs by avoiding the loop. Any suggestions please.
Upvotes: 2
Views: 45
Reputation: 214957
Here is another option without np.unique
(assume same groupIDs have been sorted together):
def diff():
idx = np.concatenate(([0], np.flatnonzero(np.diff(groupIDs))+1))
inv = np.repeat(pd.np.arange(idx.size), np.diff(np.concatenate((idx, [groupIDs.size]))))
return np.add.reduceat((Xs.T*p), idx, axis=1).T[inv]
diff()
#array([[ 2. , 3. , 5. ],
# [ 2. , 3. , 5. ],
# [ 3.75, 2. , 4.5 ],
# [ 3.75, 2. , 4.5 ],
# [ 5. , 1. , 3. ],
# [ 5. , 1. , 3. ]])
Upvotes: 2
Reputation: 1686
import numpy as np
Xs = np.array([[1,3,3,4,5,7], [2,4,5,1,1,6], [5,5,6,4,3,2]])
groupIDs = np.array([10,10,20,20,30,30])
p = np.array([0.5, 0.5, 0.25, 0.75, 1, 0])
_,idx,tags = np.unique(groupIDs, return_index=1, return_inverse=1)
print np.add.reduceat((p*Xs).T, idx)[tags]
No need to use a for
. It is sufficient to transpose some matrix, check the last line.
I remove the transpose in the declaration of Xs
. But if you really need it, you'll have to add one in the last line ((p*Xs.T).T
)
Upvotes: 2