johnny utah
johnny utah

Reputation: 279

can I speed up iteration process in R?

I am inexperienced with iteration in R and am hoping to speed up a process as I am implementing some analysis in a website.

I found a very useful tutorial that allows me to iterate through a matrix, pick out the criteria data above a certain threshold (>0.01) and then permeate three vectors: "source, target and corr" with these values to eventually make a nicely organised data frame.

source=c()
target=c()
corr<-c()

g1<-rownames(adj_mat)[1:dim(adj_mat)[1]]
g2<-g1

for(gene in g1){
  for(gen in g2){
    if(adj_mat[gene,gen]>0.01){
      source<-c(source,gene)
      target<-c(target,gen)
      corr<-c(corr,adj_mat[gene,gen])
    }
  }
}
network<-data.frame(source,target,corr)

While this is very good for small matrices of dimensions 1000, 500, it takes a frustrating amount of time in with ones of dimensions 10000, 700...

The matrix comes from a file that will be uploaded to and read in the website each time so I can not fit the 'source etc.' object sizes to the size of the matrix as each newly uploaded file will be of a different size.

Can anyone tell me if there is a more efficient way to do this in R?

Upvotes: 1

Views: 86

Answers (2)

sdgfsdh
sdgfsdh

Reputation: 37045

Vectorization is very important for writing performant R. This allows as much work to be done in native code as possible, with minimal transfers of values between R and native code.

For example:

# Slow
a <- c(1, 2, 3)
b <- c(4, 5, 6)

r <- c()

for (i in 1:length(a)) {
    r <- c(r, a[i] + b[i]);
}

# Fast
r <- a + b

The latter is faster because the slow method calls + 3 times, one for each iteration, whereas the fast method calls + once. You should try to batch things as much as possible. It's also much shorter code!

But what about conditionals? Suppose you want to optimize:

# Slow
a <- c(1, 2, 3)
b <- c(4, 5, 6)

r <- c()

for (i in 1:length(a)) {
    if (a[i] > b[i] / 2) {
        r <- c(r, a[i] + b[i]);
    } else {
        r <- c(r, a[i] - b[i]);
    }
}

You can use ifelse:

# Fast
a <- c(1, 2, 3)
b <- c(4, 5, 6)

r <- ifelse(a > b / 2, a + b, a - b)

Take a look at @PaulHiemstra's answer for an application to your code.

Upvotes: 2

Paul Hiemstra
Paul Hiemstra

Reputation: 60944

The biggest problem I can see right now is that you iteratively build a few data structures, i.e. source, target and corr. You can tremendously speed up your code by preallocating your objects to the correct size, and use indices to place the values.

You can further improve your code by vectorizing your operation. For example, determining which parts of m are larger than 0.01 can easily be done like this:

m[m > 0.01]

and get your data structures source, target and corr:

matching_indices = which(m > 0.01, arr.ind = TRUE)
source = matching_indices[,1]
target  = matching_indices[,2]
corr = m[m > 0.01]

This is just example code, I'm not entirely sure if this is what you need. But it provides a good step towards that.

Upvotes: 2

Related Questions