Implementing numpy bincount to change half of most common value?

Question

I am implementing KMeans algorithm using numpy.

I am making a numpy array named distances like this:

[[ 5.  1.  1.  1.  2.  1.  3.  1.  1.  1.]
 [ 5.  4.  4.  5.  7. 10.  3.  2.  1.  0.]
 [ 3.  1.  1.  1.  2.  2.  3.  1.  1.  1.]
 [ 6.  8.  8.  1.  3.  4.  3.  7.  1.  1.]
 [ 4.  1.  1.  3.  2.  1.  3.  1.  1.  1.]
 [ 8. 10. 10.  8.  7. 10.  9.  7.  1.  0.]
 [ 1.  1.  1.  1.  2. 10.  3.  1.  1.  0.]
 [ 2.  1.  2.  1.  2.  1.  3.  1.  1.  1.]
 [ 2.  1.  1.  1.  2.  1.  1.  1.  5.  1.]
 [ 4.  2.  1.  1.  2.  1.  2.  1.  1.  1.]]

Where first 9 columns are data points and last column is the cluster the data point gets assigned to for random centroids initialized.

In this array I would like to see these values, 0,1,2 in last column. As in the given array above we can only see 0,1 in last column. In this case I intend to change half of the most common value from last column to 2.

k=3
for c in range(k):
    if c in distances[:, -1]:
    else:
        x = np.bincount(distances[:,-1]).argmax()
        distances[:len(distances[distances[:,-1]==x])/2,-1][distances[:,-1] == x] = c

However this is not working. Can someone help me fix this problem ?

error -> IndexError: boolean index did not match indexed array along dimension 0; dimension is 0 but corresponding boolean dimension is 10

Sridhar Murali · Accepted Answer

I think this might help you

If distance is the variable which has the array

x=np.unique(distance[:,-1]).argmax()
pos=np.argwhere(distance[:,-1]==x).flatten()
for i in range(int(len(pos)/2)):
    distance[i,-1]=2

I hope this helps!

Implementing numpy bincount to change half of most common value?

Answers (1)

Related Questions