Updating centroid values k-means clustering Python

Question

I am trying to implement a k-means clustering algorithm from scratch using python. I am having problems updating the centroid values for each cluster. The code below shows where I am up to so far. I have initially clustered each data point into one of k clusters. AllData contains 329 rows; each row is a word followed by 300 features followed by the number of the cluster it has been assigned to (values 1 to 4). What I am trying to do in my loop is start off by creating an array A which only holds the rows from AllData that have been assigned to the first cluster. Then I want to take the mean of each of the feature columns in A and update the centroid to this. The loop should iteratively do this for all 4 clusters.

k = 4   
i = 1
while (i <= k):
     A = AllData[:,1:301][AllData[:,301] == i]
     centroids[i-1:i,:] = A.mean(axis=0)
     i = i + 1

The values of the 4 rows in the centroids array are updating correctly. The problem I am having is that the 4 updated centroid values are also rewriting over the first 4 rows of AllData. I don't want this to happen. The AllData array should remain unchanged. Any help would be much appreciated!

Updating centroid values k-means clustering Python

Answers (1)

Related Questions