Reputation: 353
I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is O(n*k*i)
(n
= total elements, k
= number of cluster iteration)
So can someone explain me this statement from Wikipedia and how is this NP hard?
If
k
andd
(the dimension) are fixed, the problem can be exactly solved in timeO(ndk+1 log n)
, wheren
is the number of entities to be clustered.
Upvotes: 31
Views: 60123
Reputation: 363627
It depends on what you call k-means.
The problem of finding the global optimum of the k-means objective function
is NP-hard, where Si
is the cluster i
(and there are k
clusters), xj
is the d
-dimensional point in cluster Si
and μi
is the centroid (average of the points) of cluster Si
.
However, running a fixed number t
of iterations of the standard algorithm takes only O(t*k*n*d)
, for n
(d
-dimensional) points, where k
is the number of centroids (or clusters). This what practical implementations do (often with random restarts between the iterations).
The standard algorithm only approximates a local optimum of the above function, and so do all the k-means algorithms that I've seen.
Upvotes: 46
Reputation: 2965
The problem is NP-Hard because there is another well known NP hard problem that can be reduced to (planar) k-means problem. Have a look at the paper The Planar k-means Problem is NP-hard (by Mahajan et al.) for more info.
Upvotes: 1