Reputation: 2039
I use k-means clustering with random initialization for clusters identification. Algorithm works well for nice data. But if I work with data with many noise, then my k-means algorithm looses its robustness and gives different solution for every run on same data set.
So I decided to improve my k-means clustering to minimize Ward criterion:
I wrote this algorithm in c++ here. However, problem is, that this approach is extremely slow, I am dealing with clusters with circa 20 000 points per each.
Can you suggest to me a better solution, or could you help me speed up this algorithm?
Upvotes: 0
Views: 895
Reputation: 2039
I finally found the solution. I've realized that:
What definitely helped me was Mean normalization. I did 5x k-means, calculated mean for cluster centers from each iteration. And finally run k-means with calculated means as initial solution.
Upvotes: 2