Reputation: 13
I have two numpy arrays, A
with shape (N,3)
and B with shape (N,)
and I generate from the vector A the vector with unique entries, e.g.:
A = np.array([[1.,2.,3.],
[4.,5.,6.],
[1.,2.,3.],
[7.,8.,9.]])
B = np.array([10.,33.,15.,17.])
AUnique, directInd, inverseInd, counts = np.unique(A,
return_index = True,
return_inverse = True,
return_counts = True,
axis = 0)
So that AUnique
will be
array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])
Then I obtain the simil-vector B
associated to AUnique
, and for each non-unique row in A
I sum the associated values of B
in this vector, that is:
BNew = B[directInd]
# here BNew is [10., 33.,17]
for Id in np.asarray(counts>1).nonzero()[0]:
BNew[Id] = np.sum(B[inverseInd == Id])
# here BNew is [25., 33.,17]
The problem is that the for cycle gets extremely slow for large N vectors (millions or tens of millions rows), and I was wondering if there is a way to avoid cycling and/or to make the code much faster.
Thanks in advance!
Upvotes: 1
Views: 159
Reputation: 14399
I think you can do what you want with np.bincount
BNew = np.bincount(inverseInd, weights = B)
BNew
Out[]: array([25., 33., 17.])
Upvotes: 1