Siebe Albers
Siebe Albers

Reputation: 83

Why does the Silhouette_score require labels as input?

Why is it wrong to think that it only needs the data since it: "outputs a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation)."

However, I also need to input the labels (which the function itself computes); so, why are the labels necessary to input?

Upvotes: 0

Views: 818

Answers (2)

Toukenize
Toukenize

Reputation: 1420

Silhouette_score is a metric for clustering quality, not a clustering algorithm. It considers both the inter-class and intra-class distance.

For that calculation to happen, you need to supply both the data and target labels (estimated by unsupervised methods like K-means).

Upvotes: 1

Thomas Schillaci
Thomas Schillaci

Reputation: 2453

how similar an object is to its own cluster

In order to compute the silhouette, you need to know to which cluster your samples belong.

Also:

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.

You need the labels to know what "intra-cluster" and "nearest-cluster" mean.

Upvotes: 1

Related Questions