raumkundschafter
raumkundschafter

Reputation: 441

How to calculate centroids in PCA?

To compare the centroid vectors of each group of a PCA I'm looking for a method to calculate the centroids for each PC and group. Not in particular graphical but included a plot in the MWE to make it more descriptive.

library(ggbiplot)
data(wine)
wine.pca <- prcomp(wine, center = TRUE, scale. = TRUE)
print(ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups = wine.class, ellipse = TRUE, circle = TRUE))

Upvotes: 2

Views: 5241

Answers (1)

raumkundschafter
raumkundschafter

Reputation: 441

This example provides a dataframe with the coordinates of the centroids from the PCA that then can be used to calculate distances between the centroids within the PC's

library(ggbiplot)
data(wine)
wine.pca <- prcomp(wine, center = TRUE, scale. = TRUE)
df.wine.x <- as.data.frame(wine.pca$x)
df.wine.x$groups <- wine.class
pca.centroids <- aggregate(df.wine.x[,1:13], list(Type = df.wine.x$groups), mean)

The euclidean distance between barolo and grignolino for example for the first two PC's can be calculated as follows:

dist(rbind(pca.centroids[pca.centroids$Type == "barolo",2:3],pca.centroids[pca.centroids$Type == "grignolino",2:3]), method = "euclidean")

Upvotes: 2

Related Questions