Long Phan
Long Phan

Reputation: 119

K-means++ algorithm

I read the paper k-means++: The Advantages of Careful Seeding and didn't quite understand the algorithm provided which is:

"Let D(x) denote the shortest distance from a data point x to the closest center we have already chosen.

1a. Choose an initial center c1 uniformly at random from X.

1b. Choose the next center ci, selecting ci = x' ∈ X with probability (D(x')^2) / Sum_of(D(x)^2)

1c. Repeat Step 1b until we have chosen a total of k centers.

2-4. Proceed as with the standard k-means algorithm"

(Better look at the algorithm in the link above)

Especially step 1b. What do they mean by "selecting ci = x' ∈ X with probability (D(x')^2) / Sumof(D(x)^2)". Do they mean selecting the element that has the largest proportion ? And how can perform such computation can lead to select the optimal centroids ?

Upvotes: 5

Views: 1495

Answers (3)

Ocean_Duan
Ocean_Duan

Reputation: 1

The most difference between K-Means and K-Means++ is the way the initial centers are choosen. K-means selects the initial centers randomly. Before selecting initial centers, K-Menas++ calculates all distances between points and the first center to make all centers that would be selected far away from each other. If one center is chooesn near others, the points assigned to the corresponding center need to iterate many times to separate the clusters. However, the probability they belong to one cluster is very small if the distance of two centers is far enough, and it is easy to separate the clusters.

Upvotes: 0

Beta
Beta

Reputation: 99094

The function D(x) is defined for all points x ∈ X.

In step 1b, we must choose some point to be a new center. We will choose randomly between all points (that are not already centers). But we will not give an equal chance to every point; we will assign different probabilities to different points before we choose. These probabilities must add up to 1.

Consider D(x)^2. We can evaluate this at every point, and add up the values: Sum_of(D(x)^2).

Then we can assign each point x' a probability equal to D(x')^2 / Sum_of(D(x)^2). These probabilities add up to 1, and give a better chance to points far away from all existing centers.

Upvotes: 5

Jerry
Jerry

Reputation: 4408

http://en.wikipedia.org/wiki/K-means%2B%2B

   1. Choose one center uniformly at random from among the data points.
   2. For each data point x, compute D(x), the distance between x and the nearest center that has already been chosen.
   3. Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D(x)2.
   4. Repeat Steps 2 and 3 until k centers have been chosen.
   5. Now that the initial centers have been chosen, proceed using standard k-means clustering.

Upvotes: 0

Related Questions