melbez
melbez

Reputation: 1000

Visualizing PCA with large number of variables in R using ggbiplot

I am trying to visualize a PCA that includes 87 variables.

prc <-prcomp(df[,1:87], center = TRUE, scale. = TRUE)
ggbiplot(prc, labels = rownames(df[,1:87]), var.axes = TRUE)

When I create the biplot, many of the vectors overlap with each other, making it impossible to read the labels. I was wondering if there is any way to only show some of the labels at a time. For example, I think it'd be useful if I could create a few separate biplots with each one showing only a subset of the labels on the vectors.

This question seems closely related, but I don't know if it translates to the latest version of ggbiplot. I'm also not sure how to modify the original functions.

Upvotes: 2

Views: 2992

Answers (1)

xilliam
xilliam

Reputation: 2259

A potential solution is to use the factoextra package to visualize your PCA results. The fviz_pca_biplot() function includes a repel argument. When repel = TRUE the plot labels are spread out to minimize overlap. There are also select.var options mentioned in the documentation, such as select.var = list(contrib=5) to display only the 5 most influential vectors. Also a select.var = list(name) option that seems to allow for the specification of a specific subset of variables that you want shown.

# read data
df <- mtcars[, c(1:7,10:11)]

# perform PCA
library("FactoMineR")
res.pca <- PCA(df, graph = FALSE)

# visualize
library(factoextra)
fviz_pca_biplot(res.pca, repel = TRUE, select.var = list(contrib = 5))

biplot with only 5 vectors shown

Upvotes: 5

Related Questions