Reputation: 1217
This is my R script with three nested for
loops. It takes more than 2 minutes to finish 1 out of 2000 rounds of for
loop. How to speedup this?
col<-NULL
row<-NULL
rep<-ncol(dat)-2
dist<-NULL
c1=3
for (i in 1:rep){
c2=3
for(j in 1:rep){
r=1
for (k in 1:nrow(dat)){
p<-(dat[r,c1]-dat[r,c2])^2
row<-rbind(row, p)
r=r+1
}
row<-sqrt(sum(row))
row1<-(1/(2*length(unique(dat[,1]))))*row
col<-cbind(col, row1)
c2=c2+1
row<-NULL
}
dist<-rbind(dist,col)
col<-NULL
c1=c1+1
}
EDIT:
> head(dat)
mark alle G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24
1 M1 228 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.0 0.5 0 0
2 M1 234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0.5 1 1
3 M1 232 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0 0
4 M1 240 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0 0
5 M1 230 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0.0 0 0
6 M1 238 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0.0 0 0
Upvotes: 0
Views: 200
Reputation: 1217
Although similar function already exists, I tried my own way.
I removed one complete for
loop, rbind
and cbind
.
Now this takes only 124 seconds to write 1014 X 1014 matrix against 2 minutes for just one round of 1014 X 1014 matrix (means 1 X 1014).
dat<-read.table("alreq1.txt", sep="\t",header=T)
col<-NULL
row<-NULL
rep<-ncol(dat)-2
dist<-NULL
dist<- data.frame(matrix(NA, nrow = rep, ncol = rep))
m<-1/sqrt(2*length(unique(dat[,1])))
c1=3
for (i in 1:rep){
c2=3
for(j in 1:rep){
p<-na.omit(dat[,c1]-dat[,c2])^2
row<-sum(p)
row<-sqrt(row)*m
col[j] <- row
c2=c2+1
row<-NULL
p<-NULL
}
dist[i,] <- col
c1=c1+1
col<-NULL
}
Hopefully, still this code can be improved.
Upvotes: 1
Reputation: 25736
I don't know the Modified Rogers Genetic distance but it looks like the euclidean distance multiplied by 1/(2*length(unique(dat$mark)))
:
f <- 1/(2*length(unique(dat$mark)))
d <- f*dist(t(dat[, -c(1, 2)]), method="euclidean")
Upvotes: 4
Reputation: 7592
The biggest thing you can do to speed up the loop is to preallocate the vectors and matrices before the loops. Then, instead of using cbind()
and rbind()
, add the results to the vectors/matrices like so:
# Was: row<-rbind(row, p)
row[k] <- p
# Was: col<-cbind(col, row1)
col[j] <- row1
# Was: dist<-rbind(dist,col)
dist[i, ] <- col
After that, you can explore ways to vectorize the operation or, better yet, see if there already exists a function to perform this task (or if the task is based on something for which there exists a function). Further, anything that does not depend on the loop (such as row1<-(1/(2*length(unique(dat[,1]))))
) should be moved out of the loop. Otherwise you are just recalculating the same value over and over which negatively effects performance.
The key with loops is avoiding rbind()
and cbind()
by preallocating the vectors and matrices before the loop will provide a lot of performance boost.
Upvotes: 3