Elisabeth
Elisabeth

Reputation: 146

k-medoids How are new centroids picked?

My understanding of K-medoids is that centroids are picked randomly from existing points. Clusters are calculated by dividing remaining points to the nearest centroid. Error is calculated (absolute distance).

a) How are new centroids picked? From examples seams that they are picked randomly? And error is calculated again to see if those new centroids are better or worse.

b) How do you know that you need to stop picking new centroids?

Upvotes: 4

Views: 1942

Answers (3)

Aditya
Aditya

Reputation: 1068

Before answering a brief about k-medoids would be needed which i have stated in the first two steps and the last two would answer your questions.

1) The first step of k-medoids is that k-centroids/medoids are randomly picked from your dataset. Suppose your dataset contains 'n' points so these k- medoids would be chosen from these 'n' points. Now you can choose them randomly or you could use approaches like smart initialization that is used in k-means++.

2) The second step is the assignment step wherein you take each point in your dataset and find its distance from these k-medoids and find the one that is minimum and add this datapoint to set S_j corresponding to C_j centroid (as we have k-centroids C_1,C_2,....,C_k).

3) The third step of the algorithm is updation step.This will answer your question regarding how new centroids are picked after they have been initialized. I will explain updation step with an example to make it more clear. Suppose you have ten points in your dataset which are (x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,x_10). Now suppose our problem is 2-cluster one so we firstly choose 2-centroids/medoids randomly from these ten points and lets say those 2-medoids are (x_2,x_5). The assignment step will remain same. Now in updation, you will choose those points which are not medoids (points apart from x_2,x_5) and again repeat the assigment and update step to find the loss which is the square of the distance between x_i's from medoids. Now you will compare the loss found using medoid x_2 and the loss found by non-medoid point. If the loss is reduced then you will swap x_2 point with any non-medoid point that has reduced the loss.If the loss is not reduced then you will keep x_2 as your medoid and won't swap. So, there can be lot of swaps in the updation step which also makes this algorithm computationally high.

4) The last step will answer your second question i.e. when should one stop picking new centroids. When you compare the loss of medoid/centroid point with a loss computed by non-medoid, If the difference is very negligible, the you can stop and keep the medoid point as a centroid only.But if the loss is quite significant then you will have to perform the swapping until the loss reduces.

I Hope that would answer your questions.

Upvotes: 1

user1721713
user1721713

Reputation: 493

It's worth to read the wikipedia page of the k-medoid algorithm. You are right about that the k medoid from the n data points selected randomly at the first step.

The new medoids are picked by swapping every medoid m and every non-medoid o in a loop and calculating the distance again. If the cost increased you undo the swap.

The algorithm stops if there is no swap for a full iteration.

Upvotes: 3

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77505

The process for choosing the initial medoids is fairly complicated.. many people seem to just use random initial centers instead.

After this k medoids always considers every possible change of replacing one of the medoids with one non-medoid. The best such change is then applied, if it improves the result. If no further improvements are possible, the algorithm stops.

Don't rely on vague descriptions. Read the original publications.

Upvotes: 1

Related Questions