Reputation: 149
I searched for a long time for a straightforward explanation of the distance vs correlation biplots, as well as an explanation of how to transform the standard outputs of PCA to achieve the two biplots. All the stack overflow explanations 1 2 3 4 I saw went way over my head with math terms. How can I create both a distance biplot and a correlation biplot using the outputs of R's prcomp?
Upvotes: 1
Views: 737
Reputation: 149
The best explanation I found is some lecture slides from Pierre Legendre, Département de sciences biologiques, Université de Montréal (http://biol09.biol.umontreal.ca/PLcourses/Ordination_section_1.1_PCA_Eng.pdf). However, while these slides did show the way to plot a distance and correlation biplot manually, they didn't show how to plot the distance and correlation biplots from the results of prcomp.
So I worked through an example that shows how one can use the outputs of prcomp for them to be equivalent to the example walked through in the pdf above. I am leaving this here for future people like myself who are wondering how to plot a distance vs correlation biplot and when you want to use each (according to Pierre Legendre)
set.seed(1)
#Run standard PCA
pca_res <- prcomp(mtcars[, 1:7], center = TRUE, scale = TRUE, retx = TRUE)
#To print a distance biplot, simply plot pca_red$x as points and $rotation
#as vectors
library(ggplot2)
arrow_len <- 3 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x), aes(x = PC1, y = PC2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation),
aes(x = 0, y = 0, yend = arrow_len*PC1, xend = arrow_len*PC2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation),
mapping = aes(y = arrow_len*PC1, x = arrow_len*PC2,
label = row.names(pca_res$rotation)))
#This is equivalent to the following steps:
Y_centered <- scale(mtcars[, 1:7], center = TRUE, scale = TRUE)
Y_eig <- eigen(cov(Y_centered))
#Note that Y_eig$vectors == pca_res$rotation ("rotations" or "loadings")
# and Y_eig$values (eigenvalues) == pca_res$sdev**2
#For a distance biplot
U_frame <- Y_eig$vectors
#F is your PC scores, achieved by multiplying your original data by the rotations
F_frame <- Y_centered %*% U_frame
#flipping constants if needed bc PC axis direction is arbitrary
x_flip = -1
y_flip = -1
ggplot(data = as.data.frame(F_frame), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
#To print a correlation biplot, matrix multiply your rotations/loadings
# by the identity matrix times your PCA standard deviations
# (equivalent to the sqrt of your eigen values)
U_frame_scaling2 <- U_frame %*% diag(Y_eig$values^(0.5))
#And divide your PC scores by your PCA standard deviations
# (equivalent to 1/sqrt(eigen values)
F_frame_scaling2 <- F_frame %*% diag(Y_eig$values^(-0.5))
#Plot
arrow_len <- 1.5 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x %*% diag(1/pca_res$sdev)),
aes(x = V1, y = V2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
aes(x = 0, y = 0, yend = arrow_len*V1, xend = arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
mapping = aes(y = arrow_len*V1, x = arrow_len*V2,
label = row.names(pca_res$rotation)))
ggplot(data = as.data.frame(F_frame_scaling2), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame_scaling2),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame_scaling2),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
As for the differences between the two (in case the pdf above becomes unavailable at some point):
Scaling type 1: distance biplot, used when the interest is on the positions of the objects with respect to one another. –
Scaling type 2: correlation biplot, used when the angular relationships among the variables are of primary interest. –
In scaling 1 (distance biplot),
In scaling 2 (correlation biplot),
In scaling 1 (distance biplot),
In scaling 2 (correlation biplot),
Upvotes: 2