Compute between clusters sum of squares (BCSS) and total sum of squares manually (clustering in R)

Question

I am trying to manually retrieve some of the statistics associated with clustering solutions based only on the data and the clusters assignments.

For instance, kmeans() computes the between clusters and total sum of squares.

data <- iris[1:4]
  
fit <- kmeans(data, 3)
clusters <- fit$cluster

fit$betweenss
#> [1] 602.5192
fit$totss
#> [1] 681.3706

^{Created on 2021-08-09 by the reprex package (v2.0.1)}

I would like to recover these indices without the call to kmeans, using only data and the vector of clusters (so that I could apply that to any clustering solutions).

Thanks to this other post, I managed to retrieve the within clusters sum of squares, and I just lack the between and total now. For them, that other post says :

The total sum of squares, sum_x sum_y ||x-y||² is constant.

The total sum of squares can be computed trivially from variance.

If you now subtract the within-cluster sum of squares where x and y belong to the same cluster, then the between cluster sum of squares remains.

But I don't know how to translate that to R... Any help is appreciated.

Compute between clusters sum of squares (BCSS) and total sum of squares manually (clustering in R)

Answers (1)

Related Questions