Reputation: 1117
To test the quality of the biological replicates by calculating Pearson correlation coefficient for the pairs of biological replicates for each cell-line.
A<-data.frame(A1=rnorm(100), A2=rnorm(100),
A3=rnorm(100), B1=rnorm(100),
B2=rnorm(100))
Some cases of the data have two replicates and others are three and contain no missing values. How to obtain such a plot to compare the replicates?
Upvotes: 0
Views: 1580
Reputation: 15163
Here's one possible way. There's probably a more concise way to do this, though.
First thing, figure out which columns are replicates of which.
fullnames<-colnames(A)
basenames<-substr(fullnames,1,nchar(fullnames)-1)
repnum<-as.integer(substr(fullnames,nchar(fullnames),nchar(fullnames)))
Now compute the correlation matrix, and extract the data you need:
ca<-cor(A)
corMask<-upper.tri(ca) & basenames[col(ca)]==basenames[row(ca)]
corSub<-ca[corMask]
nameSub<-basenames[row(ca)[corMask]]
repnumSub<-apply(cbind(repnum[row(ca[corMask]],repnum[col(ca[corMask]]),1,paste,collapse="-")
Then draw the plot:
require(ggplot2)
plotdata<-data.frame(name=nameSub,cor=corSub,replicas=repnumSub)
ggplot(plotdata,aes(x=name,y=cor,pch=replicas))+geom_point()
Here's what it looks like, with the following sample data set:
set.seed(123)
A<-data.frame(A1=rnorm(100), A2=rnorm(100),A3=rnorm(100),
B1=rnorm(100),B2=rnorm(100),
C1=rnorm(100),C2=rnorm(100),C3=rnorm(100))
You can then add color or change the plot limits etc. to make it look the way you want.
Upvotes: 1
Reputation: 31
I suggest that the better representation would be using a heat map. The heat map visualizes the correlations between replicates. So, replicates from the same batch will show much higher correlation than other cell lines. So, ultimately you will see values close to 1.0 at the diagonal of the cells in your heat map. The heat map also shows the poor correlation between replicates from different cell lines. To perform such drawing function, you can use heatmap.2
from {gplot} package.
Upvotes: 0