Tony
Tony

Reputation: 793

avoid column slicing array for calculations

import numpy as np
Xs = np.array([[1,3,3,4,5,7], [2,4,5,1,1,6], [5,5,6,4,3,2]]).T
groupIDs = np.array([10,10,20,20,30,30])
p = np.array([0.5, 0.5, 0.25, 0.75, 1, 0])
_,idx,tags = np.unique(groupIDs, return_index=1, return_inverse=1)
print(Xs)
[[1 2 5]
 [3 4 5]
 [3 5 6]
 [4 1 4]
 [5 1 3]
 [7 6 2]]

I am trying to create a new table with the sum of products between p and X per group, for each column. The only way I can think to make this work is

new = np.empty((6,3))
for i in range(3):
    new[:,i] = np.add.reduceat((p * Xs[:,i]),idx)[tags] 
print(new)
[[ 2.    3.    5.  ]
 [ 2.    3.    5.  ]
 [ 3.75  2.    4.5 ]
 [ 3.75  2.    4.5 ]
 [ 5.    1.    3.  ]
 [ 5.    1.    3.  ]]

I am straggling to tune my mind into thinking 'vector-wise' to make it work (hopefully) faster for my large dataset consisting of thousands of xs by avoiding the loop. Any suggestions please.

Upvotes: 2

Views: 45

Answers (2)

akuiper
akuiper

Reputation: 214957

Here is another option without np.unique (assume same groupIDs have been sorted together):

def diff():
    idx = np.concatenate(([0], np.flatnonzero(np.diff(groupIDs))+1))
    inv = np.repeat(pd.np.arange(idx.size), np.diff(np.concatenate((idx, [groupIDs.size]))))
    return np.add.reduceat((Xs.T*p), idx, axis=1).T[inv]

diff()
#array([[ 2.  ,  3.  ,  5.  ],
#       [ 2.  ,  3.  ,  5.  ],
#       [ 3.75,  2.  ,  4.5 ],
#       [ 3.75,  2.  ,  4.5 ],
#       [ 5.  ,  1.  ,  3.  ],
#       [ 5.  ,  1.  ,  3.  ]])

Upvotes: 2

Nuageux
Nuageux

Reputation: 1686

import numpy as np
Xs = np.array([[1,3,3,4,5,7], [2,4,5,1,1,6], [5,5,6,4,3,2]])
groupIDs = np.array([10,10,20,20,30,30])
p = np.array([0.5, 0.5, 0.25, 0.75, 1, 0])
_,idx,tags = np.unique(groupIDs, return_index=1, return_inverse=1)

print np.add.reduceat((p*Xs).T, idx)[tags]

No need to use a for. It is sufficient to transpose some matrix, check the last line.

I remove the transpose in the declaration of Xs. But if you really need it, you'll have to add one in the last line ((p*Xs.T).T)

Upvotes: 2

Related Questions