Reputation: 2882
Suppose I have labels to cluster data into several groups. Now I want to find the indices of the maximum w.r.t the original array for each cluster. For example:
import numpy as np
labels = np.array([1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0])
y = np.random.randint(0, 9, len(labels)) #array([6, 7, 5, 4, 2, 8, 4, 4, 5, 6, 4])
I want [1, 5] because for cluster 1, the max is 7 at index 1 and for cluster 0, the max is 8 at index 5. Is it possible to get that without for loop?
My naive solution for reference:
out = []
for i in [0, 1]:
temp = y.copy()
temp[labels == i] = -1
out.append(np.argmax(temp))
Upvotes: 2
Views: 405
Reputation: 3396
I believe this should work:
labels = np.array([1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0])
y = np.array([6, 7, 5, 4, 2, 8, 4, 4, 5, 6, 4])
arr = np.unique(labels)
result = np.ma.masked_where(labels[:,None] == arr, np.tile(y,(arr.shape[0],1)).T).argmax(axis=0)
The main idea is that we force the 1D array to broadcast by adding a new dimension. Then we have to also create the same dimensions for the second array, but this lets us compare as many values as we want without looping.
Upvotes: 1