Reputation: 119
I've recently read the Single Pass Seed Selection Algorithm for k-Means article, but not really understand the algorithm, which is:
Dist
in which Dist (i,j)
represents distance from i
to j
Sumv
in which Sumv (i)
is the sum of the distances from i
th point to all other points.i
which is min (Sumv)
and set Index = i
C
as the first centroidxi
, set D (xi)
to be the distance between xi
and the nearest point in C
y
as the sum of distances of first n/k
nearest points from the Index
i
so that D(x1)^2+D(x2)^2+...+D(xi)^2 >= y > D(x1)^2+D(x2)^2+...+D(x(i-1))^2
xi
to C
k
centersEspecially step 6, do we still use the same Index
(same point) over and over or we use the newly added point from C
? And about step 8, does i
have to be larger than 1
?
Upvotes: 2
Views: 556
Reputation: 1
Single Pass Seed Selection algorithm is a novel algorithm. Single Pass mean that without any iterations first seed can be selected. k-means++ performance is depends on first seed. It is overcome in SPSS. Please gothrough the paper "Robust Seed Selestion Algorithm for k-means" from the same authors
John J. Louis
Upvotes: -1
Reputation: 6514
Honestly, I wouldn't worry about understanding that paper - its not very good.
You will get a lot more benefit from learning and understanding the k-means++ algorithm they are comparing against, and reading some of the history from that.
If you really want to understand what they are doing, I would brush up on your matlab and read their provided matlab code. But its not really worth it. If you look up the quantile seed selection algorithm, they are essentially doing something very similar. Instead of using the distance to the first seed to sort the points, they appear to be using the sum of pairwise distances (which means they don't need an initial seed, hence the unique solution).
Upvotes: 4