Reputation: 83
Why is it wrong to think that it only needs the data since it: "outputs a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation)."
However, I also need to input the labels (which the function itself computes); so, why are the labels necessary to input?
Upvotes: 0
Views: 818
Reputation: 1420
Silhouette_score
is a metric for clustering quality, not a clustering algorithm. It considers both the inter-class and intra-class distance.
For that calculation to happen, you need to supply both the data and target labels (estimated by unsupervised methods like K-means
).
Upvotes: 1
Reputation: 2453
how similar an object is to its own cluster
In order to compute the silhouette, you need to know to which cluster your samples belong.
Also:
The Silhouette Coefficient is calculated using the mean intra-cluster distance (
a
) and the mean nearest-cluster distance (b
) for each sample. The Silhouette Coefficient for a sample is(b - a) / max(a, b)
. To clarify,b
is the distance between a sample and the nearest cluster that the sample is not a part of.
You need the labels to know what "intra-cluster" and "nearest-cluster" mean.
Upvotes: 1