ramesh
ramesh

Reputation: 1217

R: Speed up for loop

This is my R script with three nested for loops. It takes more than 2 minutes to finish 1 out of 2000 rounds of for loop. How to speedup this?

col<-NULL
row<-NULL
rep<-ncol(dat)-2
dist<-NULL
c1=3
for (i in 1:rep){
  c2=3
  for(j in 1:rep){
    r=1
    for (k in 1:nrow(dat)){
      p<-(dat[r,c1]-dat[r,c2])^2
      row<-rbind(row, p)
      r=r+1
    }
    row<-sqrt(sum(row))
    row1<-(1/(2*length(unique(dat[,1]))))*row
    col<-cbind(col, row1)
    c2=c2+1
    row<-NULL
  }
  dist<-rbind(dist,col)
  col<-NULL
  c1=c1+1
}

EDIT:

> head(dat)
  mark alle G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24
1   M1  228  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1   1   1   1   1 0.0 0.5   0   0
2   M1  234  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0 0.5 0.5   1   1
3   M1  232  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0 0.0 0.0   0   0
4   M1  240  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0 0.0 0.0   0   0
5   M1  230  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0 0.5 0.0   0   0
6   M1  238  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0 0.0 0.0   0   0

Upvotes: 0

Views: 200

Answers (3)

ramesh
ramesh

Reputation: 1217

Although similar function already exists, I tried my own way.
I removed one complete for loop, rbind and cbind.
Now this takes only 124 seconds to write 1014 X 1014 matrix against 2 minutes for just one round of 1014 X 1014 matrix (means 1 X 1014).

dat<-read.table("alreq1.txt", sep="\t",header=T)
col<-NULL
row<-NULL
rep<-ncol(dat)-2
dist<-NULL
dist<- data.frame(matrix(NA, nrow = rep, ncol = rep))
m<-1/sqrt(2*length(unique(dat[,1])))
c1=3
for (i in 1:rep){
  c2=3
  for(j in 1:rep){
      p<-na.omit(dat[,c1]-dat[,c2])^2
      row<-sum(p)
      row<-sqrt(row)*m
      col[j] <- row
      c2=c2+1
      row<-NULL
      p<-NULL
  }
  dist[i,] <- col
  c1=c1+1
  col<-NULL
  }

Hopefully, still this code can be improved.

Upvotes: 1

sgibb
sgibb

Reputation: 25736

I don't know the Modified Rogers Genetic distance but it looks like the euclidean distance multiplied by 1/(2*length(unique(dat$mark))):

f <- 1/(2*length(unique(dat$mark)))
d <- f*dist(t(dat[, -c(1, 2)]), method="euclidean")

Upvotes: 4

Christopher Louden
Christopher Louden

Reputation: 7592

The biggest thing you can do to speed up the loop is to preallocate the vectors and matrices before the loops. Then, instead of using cbind() and rbind(), add the results to the vectors/matrices like so:

# Was: row<-rbind(row, p)
row[k] <- p

# Was: col<-cbind(col, row1)
col[j] <- row1

# Was: dist<-rbind(dist,col)
dist[i, ] <- col

After that, you can explore ways to vectorize the operation or, better yet, see if there already exists a function to perform this task (or if the task is based on something for which there exists a function). Further, anything that does not depend on the loop (such as row1<-(1/(2*length(unique(dat[,1]))))) should be moved out of the loop. Otherwise you are just recalculating the same value over and over which negatively effects performance.

The key with loops is avoiding rbind() and cbind() by preallocating the vectors and matrices before the loop will provide a lot of performance boost.

Upvotes: 3

Related Questions