Reputation: 105
I am implementing KMeans algorithm using numpy.
I am making a numpy array named distances like this:
[[ 5. 1. 1. 1. 2. 1. 3. 1. 1. 1.]
[ 5. 4. 4. 5. 7. 10. 3. 2. 1. 0.]
[ 3. 1. 1. 1. 2. 2. 3. 1. 1. 1.]
[ 6. 8. 8. 1. 3. 4. 3. 7. 1. 1.]
[ 4. 1. 1. 3. 2. 1. 3. 1. 1. 1.]
[ 8. 10. 10. 8. 7. 10. 9. 7. 1. 0.]
[ 1. 1. 1. 1. 2. 10. 3. 1. 1. 0.]
[ 2. 1. 2. 1. 2. 1. 3. 1. 1. 1.]
[ 2. 1. 1. 1. 2. 1. 1. 1. 5. 1.]
[ 4. 2. 1. 1. 2. 1. 2. 1. 1. 1.]]
Where first 9 columns are data points and last column is the cluster the data point gets assigned to for random centroids initialized.
In this array I would like to see these values, 0,1,2 in last column. As in the given array above we can only see 0,1 in last column. In this case I intend to change half of the most common value from last column to 2.
k=3
for c in range(k):
if c in distances[:, -1]:
else:
x = np.bincount(distances[:,-1]).argmax()
distances[:len(distances[distances[:,-1]==x])/2,-1][distances[:,-1] == x] = c
However this is not working. Can someone help me fix this problem ?
error -> IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 10
Upvotes: 0
Views: 208
Reputation: 390
I think this might help you
If distance
is the variable which has the array
x=np.unique(distance[:,-1]).argmax()
pos=np.argwhere(distance[:,-1]==x).flatten()
for i in range(int(len(pos)/2)):
distance[i,-1]=2
I hope this helps!
Upvotes: 1