Reputation: 1381
I have a set of data such as;
name Exp1Res1 Exp1Res2 Exp1Res3 ExpRes1 Exp2Res2 Exp3Res3
[1] ID1 5 7 9 7 9 2
[2] ID2 6 4 2 9 5 1
[3] ID3 4 9 9 9 11 2
I need to determine the correlation between experiment 1 and 2 for each row. As there are actually 37 columns and 100,000 rows in my dataset (FullSet), my original solution of looping through is far too slow (refer below), so I wanted to optimize.
My original solution was;
df <- data.frame(matrix(ncol = 5, nrow = dim(FullSet)[1]))
names(df)<-c("ID","pearson","spearman")
for (i in seq(1, dim(FullSet)[1]))
{
pears=cor(as.numeric(t(FullSet[i,2:19])),as.numeric(t(FullSet[i,20:37])), method="pearson")
spear=cor(as.numeric(t(FullSet[i,2:19])),as.numeric(t(FullSet[i,20:37])), method="pearson")
df[i,]<-c(FullSet[i,1],pears,spear)
}
I feel something like this should work;
FullSet$pearson<-cor(as.numeric(t(FullSet[,2:19])),as.numeric(t(FullSet[,20:37])), method="pearson")
but I don't know if/how to reference just the current row in the transpose -
t(FullSet[,2:19]) - which should read something like t(FullSet[<currow>,2:19]).
Help would be appreciated - I don't know if my approach is even correct.
Output should look like (Results are not correct - for example only)
name Pearson Spearman
[1] ID1 0.8 .75
[2] ID2 0.9 .8
[3] ID3 0.85 .7
Upvotes: 1
Views: 538
Reputation: 5497
what about bringing it to the format:
ID EXP Res
1 1 .
1 1 .
1 2 .
1 2 .
by using reshape
and then letting plyr
do the work:
require(plyr)
ddply(df, .(ID, EXP), summarize, cor(...))
would that be a possibility? if you do it for spearman and for perason seperately.
Upvotes: 4