Reputation: 115
I'm using the following code for clustering with KMeans from sklearn.cluster.KMeans
from sklearn.cluster import KMeans
num_clusters = 60
km = KMeans(n_clusters=num_clusters, init="k-means++", n_init = 100)
%time km.fit(newvec)
clusters = km.labels_.tolist()
To avoid local minima I use n_init = 100. What else can be done to avoid falling into local minima. I appreciate your help.
Upvotes: 1
Views: 4386
Reputation: 16
There is no way to technically avoid the chances of getting stuck at a local minimum, but there are two ways I can think of lowering the chances you will get a far from optimal solution.
Upvotes: 0
Reputation: 77
Another "hack" - I've had success by randomizing the cluster assignments during iterations, for example (in R) by multiplying the Euclidian distance by
(1 + runif(1,0,1.0)/iteration)
so in the first iterations, the distance may be up to twice as large as it really is, and as iterations go up, the true distance is used. In my examples this has helped me out of local minima!
Upvotes: 0
Reputation: 77454
Why do you think you need the global minimum?
Any clustering is only a heuristic. There is no such thing as the optimal clustering, because this is an explorative technique.
Also, your data probably is not "exact" but you only have a subset, limited precision etc. - "optimality" is completely overrated.
Compare the best result you got from 10 initializations to the best you got from 100 initializations. Does it make a difference? How much longer would you wait to get the same improvement again?
Upvotes: 0
Reputation: 2240
I'm afraid that with the k-means algorithm, you cannot entirely avoid local minima; you can only try to minimize your chances of getting one.
This Cross Validated post has a good discussion on why you cannot escape local minima.
A common hack used by many, and the one you are doing by setting n_init = 100
is to run K-means multiple times and then choosing the run that gives the lowest error. If you run this k^n times and then choose the best out of that, then you WOULD be guaranteed you're finding a global minima, but that's too time consuming to be practical.
Upvotes: 2