Reputation: 1
Im a biologist, not a programmer so please be gentle.
So I have a dataset that looks like
Genes Patient1 Patient2 Patient3
A 324 433 343
B 431 342 124
Z 232 234 267
then I have the sample sheet where it contains sample info like:
Patient1 - Healthy
Patient2 - Disease
Patient3 - Healthy
I am using:
library(ggfortify)
df <- dataset
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res)
Then I want to do
autoplot(pca_res, data = ?, colour = '?')
I wish to use the info from the sample sheet to color my PCA based on the state (healthy/disease) using the autoplot function. Is there a way to do this?
Upvotes: 0
Views: 199
Reputation: 1763
First, I would create a complete data.frame with all information available.
For example, you will need to create this kind of data.frame :
df=structure(list(A = c(324, 433, 343), B = c(431, 342, 124), Z = c(232,
234, 267), Status = c("Healthy", "Disease", "Healthy")), row.names = c("Patient1",
"Patient2", "Patient3"), class = "data.frame")
After, you could use the factoextra
package that is very handy for plotting PCA :
pca_res <- prcomp(df, scale. = TRUE)
library(factoextra)
fviz_pca_ind(pca_res, habillage=df$Status)
You can check the fviz_pca_ind
documentation to modify the color thereafter
Edit :
To create the whole dataframe from your 2 datasets :
1)Take your first dataframe and put the first column as rownames
rownames(df)=df$Genes
df=df[,-1] #remove the gene column in order to keep only the values
2)Formatting your second dataframe You should format it to havethe same columns as df (Patient1, Patient2,...) with for each one the disease status, that you will call df2
df2
rownames(df2)=c("Status")
Patient1 Patient2 Patient3
Healthy Disease Healthy
We don't know your data so you have to perform this by your own
3)Then you rbind df and df2
df3=rbind(df,df2)
df3=data.frame(t$df)
and then your perform PCA with df3
Upvotes: 1