user1158903
user1158903

Reputation:

PCA FactoMineR plot data

I'm running an R script generating plots of the PCA analysis using FactorMineR.

I'd like to output the coordinates for the generated PCA plots but I'm having trouble finding the right coordinates. I found results1$ind$coord and results1$var$coord but neither look like the default plot.

I found http://www.statistik.tuwien.ac.at/public/filz/students/seminar/ws1011/hoffmann_ausarbeitung.pdf and http://factominer.free.fr/classical-methods/principal-components-analysis.html but neither describe the contents of the variable created by the PCA

library(FactoMineR)
data1 <- read.table(file=args[1], sep='\t', header=T, row.names=1)
result1 <- PCA(data1,ncp = 4, graph=TRUE) # graphs generated automatically
plot(result1)

Upvotes: 6

Views: 10686

Answers (4)

Beeflight31
Beeflight31

Reputation: 249

Recently I developped a user friendly R package named "GABB", to perform simple and nice PCA, including biplot. For the Biplot, use the argument Biplot.PCA = TRUE. Check the following example with mtcars data set :

library(GABB)
library(FactoMiner)


## Example of GABB package pipeline with the base data.set "mtcars" 
my.data <- mtcars

## Data preparation for RDA and PCA : tranformation and scaling of numeric/quantitative variables

prep_data(data = my.data, quantitative_columns = c(1:7), transform_data_method = "log", scale_data = T)

## Create PCA
library(FactoMineR)
my.pca <- FactoMineR::PCA(X = data_quant) 


## Create, display and save graphic output of individual and variable PCA

#Basic output with minimum required parameters
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca, factor.names = c("vs", "am", "gear", "carb"))

#Advanced outputs (image below)
PCA_RDA_graphics(complete.data.set = initial_data_with_quant_transformed, PCA.object = my.pca, 
                 factor.names = c("vs", "am", "gear", "carb"), Biplot.PCA = TRUE,col.arrow.var.PCA = "grey",
                 Barycenter = TRUE, Segments = TRUE, Ellipse.IC.95 = TRUE,
                 Barycenter.Ellipse.Fac1 = "vs", Barycenter.Ellipse.Fac2 = "am",
                 factor.colors = "vs", factor.shapes = "am",
                 Barycenter.factor.col = "vs", Barycenter.factor.shape = "am")

enter image description here

Upvotes: 0

Ben
Ben

Reputation: 42283

I found that $ind$coord[,1] and $ind$coord[,2] are the first two pca coords in the PCA object. Here's a worked example that includes a few other things you might want to do with the PCA output...

# Plotting the output of FactoMineR's PCA using ggplot2
#
# load libraries
library(FactoMineR)
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(gridExtra)
#
# start with a clean slate
rm(list=ls(all=TRUE)) 
#
# load example data
data(decathlon)
#
# compute PCA
res.pca <- PCA(decathlon, quanti.sup = 11:12, quali.sup=13, graph = FALSE)
#
# extract some parts for plotting
PC1 <- res.pca$ind$coord[,1]
PC2 <- res.pca$ind$coord[,2]
labs <- rownames(res.pca$ind$coord)
PCs <- data.frame(cbind(PC1,PC2))
rownames(PCs) <- labs
#
# Just showing the individual samples...
ggplot(PCs, aes(PC1,PC2, label=rownames(PCs))) + 
  geom_text() 

enter image description here

# Now get supplementary categorical variables
cPC1 <- res.pca$quali.sup$coor[,1]
cPC2 <- res.pca$quali.sup$coor[,2]
clabs <- rownames(res.pca$quali.sup$coor)
cPCs <- data.frame(cbind(cPC1,cPC2))
rownames(cPCs) <- clabs
colnames(cPCs) <- colnames(PCs)
#
# Put samples and categorical variables (ie. grouping
# of samples) all together
p <- ggplot() + theme(aspect.ratio=1) + theme_bw(base_size = 20) 
# no data so there's nothing to plot...
# add on data 
p <- p + geom_text(data=PCs, aes(x=PC1,y=PC2,label=rownames(PCs)), size=4) 
p <- p + geom_text(data=cPCs, aes(x=cPC1,y=cPC2,label=rownames(cPCs)),size=10)
p # show plot with both layers

enter image description here

# Now extract the variables
#
vPC1 <- res.pca$var$coord[,1]
vPC2 <- res.pca$var$coord[,2]
vlabs <- rownames(res.pca$var$coord)
vPCs <- data.frame(cbind(vPC1,vPC2))
rownames(vPCs) <- vlabs
colnames(vPCs) <- colnames(PCs)
#
# and plot them
#
pv <- ggplot() + theme(aspect.ratio=1) + theme_bw(base_size = 20) 
# no data so there's nothing to plot
# put a faint circle there, as is customary
angle <- seq(-pi, pi, length = 50) 
df <- data.frame(x = sin(angle), y = cos(angle)) 
pv <- pv + geom_path(aes(x, y), data = df, colour="grey70") 
#
# add on arrows and variable labels
pv <- pv + geom_text(data=vPCs, aes(x=vPC1,y=vPC2,label=rownames(vPCs)), size=4) + xlab("PC1") + ylab("PC2")
pv <- pv + geom_segment(data=vPCs, aes(x = 0, y = 0, xend = vPC1*0.9, yend = vPC2*0.9), arrow = arrow(length = unit(1/2, 'picas')), color = "grey30")
pv # show plot 

enter image description here

# Now put them side by side in a single image
#
grid.arrange(p,pv,nrow=1)
# 
# Now they can be saved or exported...

enter image description here

Upvotes: 7

William Revelle
William Revelle

Reputation: 1230

An alternative is to use the biplot function from CoreR or biplot.psych from the psych package. This will put the components and the data onto the same figure.

For the decathlon data set, use principal and biplot from the psych package:

 library(FactoMineR) #needed to get the example data
 library(psych)  #needed for principal 
 data(decathlon)  #the data set
 pc2 <- principal(decathlon[1:10],2) #just the first 10 columns
 biplot(pc2,labels = rownames(decathlon),cex=.5, main="Biplot of Decathlon results") 
 #this is a call to biplot.psych which in turn calls biplot.
 #adjust the cex parameter to change the type size of the labels.

This looks like:

!a biplot http://personality-project.org/r/images/olympic.biplot.pdf

Bill

Upvotes: 1

Sandy Muspratt
Sandy Muspratt

Reputation: 32789

Adding something extra to Ben's answer. You'll note in the first chart in Ben's response that the labels overlap somewhat. The pointLabel() function in the maptools package attempts to find locations for the labels without overlap. It's not perfect, but you can adjust the positions in the new dataframe (see below) to fine tune if you want. (Also, when you load maptools you get a note about gpclibPermit(). You can ignore it if you're concerned about the restricted licence). The first part of the script below is Ben's script.

# load libraries
library(FactoMineR)
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(gridExtra)
#
# start with a clean slate
# rm(list=ls(all=TRUE)) 
#
# load example data
data(decathlon)
#
# compute PCA
res.pca <- PCA(decathlon, quanti.sup = 11:12, quali.sup=13, graph = FALSE)
#
# extract some parts for plotting
PC1 <- res.pca$ind$coord[,1]
PC2 <- res.pca$ind$coord[,2]
labs <- rownames(res.pca$ind$coord)
PCs <- data.frame(cbind(PC1,PC2))
rownames(PCs) <- labs 
#

# Now, the code to produce Ben's first chart but with less overlap of the labels.

library(maptools)

PCs$label=rownames(PCs)

# Base plot first for pointLabels() to get locations
plot(PCs$PC1, PCs$PC2, pch = 20, col = "red")
new = pointLabel(PCs$PC1, PCs$PC2, PCs$label, cex = .7)
new = as.data.frame(new)
new$label = PCs$label

# Then plot using ggplot2
(p = ggplot(data = PCs) + 
   geom_hline(yintercept = 0, linetype = 3, colour = "grey20") +
   geom_vline(xintercept = 0, linetype = 3, colour = "grey20") +
   geom_point(aes(PC1, PC2), shape = 20, col = "red") +
   theme_bw())

(p = p +  geom_text(data = new, aes(x, y, label = label), size = 3))

The result is:

enter image description here

Upvotes: 1

Related Questions