ALEX.VAMVAS
ALEX.VAMVAS

Reputation: 107

Selection of initial medoids in PAM algorith

I have read a couple of different articles on how PAM selects the initial medoids but I am getting conflicting views.

Some propose that the k first medoids are selected randomly, while others suggest that the algorithm selects initially the k representative medoids in the dataset (not clarifying how that "representativeness" happens though). Below I have listed these resources:

Medoid calculation

Drawbacks of K-Medoid (PAM) Algorithm

https://paginas.fe.up.pt/~ec/files_1112/week_06_Clustering_part_II.pdf

https://www.datanovia.com/en/lessons/k-medoids-in-r-algorithm-and-practical-examples/

1) My question would be if someone could explain in more detail how the algorithm selects the initial k medoids as from what I understand different initial selection can lead to different results.

2) Also is that one of the reasons of using CLARA (apart from minimizing computing time and RAM storage problem) - that is to find medoids through resampling that are the "optimal" options?

I am using R as a parenthesis, with the function pam(). Open to other functions in other libraries if there is a better alternative I am not aware of.

Upvotes: 1

Views: 420

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77474

Read the original sources.

There is a lot of nonsense written later, unfortunately.

PAM consists of two algorithms:

  1. BUILD to choose the initial medoids (not randomly)
  2. SWAP to make the best improvements (not k-means style)

The k-means style algorithm works much worse than PAM. Any description of PAM that doesn't mention these two parts is inaccurate (and there are quite some of these...)

The R package seems to use the real PAM algorithm:

By default, when medoids are not specified, the algorithm first looks for a good initial set of medoids (this is called the build phase). Then it finds a local minimum for the objective function, that is, a solution such that there is no single switch of an observation with a medoid that will decrease the objective (this is called the swap phase)

CLARA clearly will find worse solutions than PAM, as it runs PAM on a sample, and I'd the optimum medoids are not in the sample, then they cannot be found.

Upvotes: 2

Related Questions