Reputation: 360
I have three data frames and I want to perform a Principal Component Analysis (PCA) in R. I merged the data frames with rbind()
and did a PCA with that. That worked. But I want to discriminate the dots according to the data frame they belong to. With the merged data frame, that is impossible (or isn´t it?). When I use PCA(X=c(df1,df2,df3)
it is complaining about differing number of rows (which is obviously actually the case).
pca <- PCA(X=c(df1,df2,df3))
fviz_pca_ind(pca,
geom.ind = "point", # show points only (nbut not "text")
col.ind = c(df1,df2,df3), # color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
addEllipses = TRUE, # Concentration ellipses
legend.title = "Groups"
)
That is not working...
How can I perform a PCA with variables of three different data frames and color discriminate them? I have no reprex because it is difficult to provide in that case.
Thank you all for your suggestions ;)
Upvotes: 2
Views: 1983
Reputation: 46898
You need to collect the length of your data frames, one way is shown below, where I collect 3 dataframes in a list:
library(FactoMineR)
library(factoextra)
df1 = subset(iris,Species=="setosa")[,-5]
df2 = subset(iris,Species=="versicolor")[,-5]
df3 = subset(iris,Species=="virginica")[,-5]
X = list(df1=df1,df2=df2,df3=df3)
you combine them using do.call(rbind..)
and the labels are repeating the names of the data frame, by its number of rows:
labels = rep(names(X),sapply(X,nrow))
table(labels)
Then you plot, giving the col.ind as labels:
pca <- PCA(do.call(rbind,X))
fviz_pca_ind(pca,
geom.ind = "point", # show points only (nbut not "text")
col.ind = labels, # color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
addEllipses = TRUE, # Concentration ellipses
legend.title = "Groups"
)
Upvotes: 3