Reputation: 55
I am working on an existing algorithm to improve its complexity. The existing algorithm uses K-means to perform clustering, whereas I chose to use K-means++ to do the same.
K-means++ was chosen because it mostly has faster and more accurate clustering results compared to K-mean.
Now, towards the end, where I have to compare the complexity of the new and existing algorithms, I find that I can't make sense of the fact that K-means++ has a complexity of O(logk) competitive.
I have tried looking everywhere on the web for an explanation, including stack overflow.
The only thing I have understood is that competitive has something to do with "on-line" and "off-line" algorithms. Could anyone please explain how it applies here?
Upvotes: 3
Views: 1443
Reputation: 59194
The full sentence that you are reading says something like "The k-means++ clustering is O(log k)-competitive to the optimal k-means solution".
This is not a statement about its algorithmic complexity. It's a statement about its effectiveness. You can use O-notation for other things.
K-means attempts to minimize a "potential" that is calculated as the sum of the squared distances of points from their cluster centers.
For any specific clustering problem, the expected potential of a K-means++ solution is at most 8(ln k + 2)
times the potential of the best possible solution. That 8(ln k + 2)
is shortened to O(log k)
for brevity.
The precise meaning of the statement that the k-means++ solution is O(log k)-competitive is that there is some constant C
such that the expected ratio between the k-means++ potential and the best possible potential is less than C*(log k) for all sufficiently large k.
the smallest such constant is about 8
Upvotes: 4