Reputation: 13
I am asking a question to a similar post posted up 2 years ago, with no full answer to it (subset of prcomp object in R). P.S. sorry for commenting on it for an answer..
Basically, my question is the same. I have generated a PCA table using prcomp that has 10000+ genes, and 1700+ cells, made up of 7 timepoints. Plotting all of them in a single file makes it difficult to see.
I would like to plot each timepoint separately, using the same PCA results table (ie without re-running prcomp).
Thanks Dean for giving me tips on posting. To think of a way to describe my dataset without actually loading it here, will take me a week I believe. I also tried the
dput(droplevels(head(object,2)))
option, but it was just too much info since I have such a large dataset. In short, it is a large matrix of single-cell dataset where people can commonly see on packages such as Seurat (https://satijalab.org/seurat/pbmc3k_tutorial_1_4.html). EDIT: I have posted a screenshot of a subset of my matrix here ().
Sorry I don't know how to re-create this or even export a text format.. But this is what I can provide: My TPM matrix has 16541 rows (defining genes), and 1798 columns (defining cells).
In it, I have "re-labelled" my columns based on timepoints, using codes such as:
D0<-c(colnames(TPM[,grep("20180419-24837-1-*", colnames(TPM))])) #D0: 286 cells
D7<-c(colnames(TPM[,grep("20180419-24837-2-*", colnames(TPM))])) #D7: 237 cells
D10<-c(colnames(TPM[,grep("20180419-24947-5-*", colnames(TPM))])) #D10: 304 cells
...... and I continued to label each timepoint.
Each timepoint was also given a specific colour.
rc<-rep("white", ncol(TPM))
rc<-[,grep("20180419-24837-1-*", colnames(TPM))]= "magenta"
...... and I continued to give colour to each timepoint.
I performed a PCA using this code:
pcaRes<-prcomp(t(log(TPM+1)), center= TRUE, scale. = TRUE)
Then I proceeded to plot a PCA plot using:
plot(pcaRes$x[,1], pcaRes$x[,2], xlab="PC1", ylab="PC2",
cex=1.0, col= rc, pch=16, main="")
Then I when I wanted to plot a PCA plot only with D0, using the same PCA output (pcaRes).. This is where I am stuck.
P.S. If anyone else has an easier way of advising how to input an example data here from my large matrix, I welcome any help. Thanks so much! Sorry I am very new in bioinformatics.
Upvotes: 1
Views: 852
Reputation: 13
thanks @ConradThiele for your suggestion, I will check out that site.
I had a chat with other bioinformatics around the institute. My query has little to do with the object being an S4 class, since I am performing prcomp outside of the package. I have extracted my matrix out of the object and then ran prcomp on it.
Solution is simple: run prcomp with full dataset, transform the prcomp output into a dataframe, input additional columns to input additional details like "timepoint", create new dataframe(s) only with the "timepoint"/ "variable" of interest from the prcomp result, make multiple sub-dataframe and then plotting these using "plot" or whatever function you use.
This was not my solution but from a bioinformatition I went for help to in my institute. Hope this helps others! Thanks again for your time.
P.S. If I have the time, I will post a copy of the code I suggested soon.
Upvotes: 0
Reputation: 585
Stack Exchange for Bioinformatics is where you you will need to go to ask question(s) or learn about the package(s) and function(s) you need to deal with you area of specialty. Stack Exchange for Bioinformatics is linked with Stackoverflow so you will just need to join, you'll have the same login.
Classes S3, S4 and Base.
This Very basic over view of Classes in R. Think of a Class as the parent you inherit all of their skills or abilities from and as a result you are able to achieve certain tasks better than others and some cases, you will not be able to do the task at all.
In R and all programming, to save re-inventing the wheel, parent classes are created so that the average person does not have to repeatedly write a function to do something simple like plot() a graph. This stuff is hidden, to access it, you inherit from the parent. The child reads the traits off the parent(s), and then it either performs the task or gives you a cryptic error message.
Base and S3 classes work well together, they are like the working class people of the R world. S4 is a specialized class made for specific fields of study to be able to provide specific functionality needed in their industry. This mean you can only use certain Base and S3 functions with Class S4 functions, most are just not compatible. So it's nothing you've done wrong, plot() and ggplot() just have the wrong parent(s) to work with your dataset.
Typical Base and S3 Class dataframe: Box like structure. Along the left hand side is all the column names, nice and neatly stacked on top of each other.
Seurat S4 Class dataframe: Tree like structure, formatted to be read by a specific function(s).
Well hope that helps and I wish you well in your career. Cheers Conrad
Ps if this helps, then click the arrow up. :)
Upvotes: 0