Roy
Roy

Reputation: 19

How to assess the efficiency of unsupervised algorithms? How does pam algorithm work?

I am working with k-means and K-medoids. With K-means execution appear the following info:

Within cluster sum of squares by cluster:
[1] 12636160  7631152 10226254
(between_SS / total_SS =  79.2 %)

Is between_SS/ total_SS a rate that shows the general throughput from the algorithm?

And with pam:

Objective function:
build     swap 
211.6604 210.5670 

How do you interpret these results?

Upvotes: 0

Views: 446

Answers (1)

G5W
G5W

Reputation: 37661

If by "throughput" and "efficiency" you mean anything about processing speed, then no. These are all measures of how successful the clustering algorithm was at finding a good grouping (or perhaps how well these points can be grouped).

k-means
The meaning of between_SS (between clusters sum of squares) and total_SS (total sum of squares) has been explained in this previous Cross Validated question and its answers. The ratio of between_SS to total_SS is some measure of how well the points clustered.

PAM
From ?pam help page

the algorithm first looks for a good initial set of medoids (this is called the build phase). Then it finds a local minimum for the objective function, that is, a solution such that there is no single switch of an observation with a medoid that will decrease the objective (this is called the swap phase).

The values listed are the values of the objective function (sum of distances of points to their medoid) at the two stages. Again, this is a measure of how well the points clustered.

For more details, see the pam help page ?pam ,
the pam.object help page ?pam.object,
the Wikipedia Page on k-medoids or
the original paper Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by Means of Medoids

Upvotes: 2

Related Questions