Reputation: 4398
I'm just a little confused here. Does the existing kmeans function in matlab have the ability to serially find the kmeans of data? Or does onlinephase mean something else?
Upvotes: 2
Views: 858
Reputation: 124563
No, the 'onlinephase'
is performed as a second step after the usual "assign-recompute" iterations (in batch mode). It guarantees to find a local minimum solution given the distance function used (also given the initial cluster centroids), by moving points between clusters until the total sum of distances cannot be further reduced.
Do not confuse this with finding the global minimum (which I believe is NP-hard problem)
This is well explained in the documentation:
Algorithms
kmeans uses a two-phase iterative algorithm to minimize the sum of point-to-centroid distances, summed over all k clusters:
The first phase uses batch updates, where each iteration consists of reassigning points to their nearest cluster centroid, all at once, followed by recalculation of cluster centroids. This phase occasionally does not converge to solution that is a local minimum, that is, a partition of the data where moving any single point to a different cluster increases the total sum of distances. This is more likely for small data sets. The batch phase is fast, but potentially only approximates a solution as a starting point for the second phase.
The second phase uses online updates, where points are individually reassigned if doing so will reduce the sum of distances, and cluster centroids are recomputed after each reassignment. Each iteration during the second phase consists of one pass though all the points. The second phase will converge to a local minimum, although there may be other local minima with lower total sum of distances. The problem of finding the global minimum can only be solved in general by an exhaustive (or clever, or lucky) choice of starting points, but using several replicates with random starting points typically results in a solution that is a global minimum.
Upvotes: 2