Parallelization/Optimization of R loops containing *apply

Question

I am working on implementing an algorithm where I try to find 5 vectors out of 20 which are "furthest apart", using some measure. To do that i use combnPrime where I get a list of some 77000 vectors representing all 5-vector grouped combinations. Each vector has around 25.

To parallelize the below loops, I tried doParallel library, but I keep messing it up somehow and get -inf as a result. I read the doParallel documentation and could not apply what I saw there to my case, it is highly probable that my lack of knowledge of R makes the problem seem a bit harder than it actually is

#df2can be thought of as (thanks to @Oliver):
df2 <- as.data.frame(replicate(20, rnorm(10)))
names(df2) <- LETTERS[1:20]


comb <- combnPrim(df2,5)
range <- length(comb)/5
result_vector <- vector(mode="list",length = range )
for (i in seq(range))
{
     total <- as.numeric(0)
     for ( j in seq(4))
     {
          for ( k in seq(j+1,5))
          {
              diff <- sum( ( mapply( '/',unlist( comb[,i][j] ) - unlist( comb[,i][k] ), ( unlist(comb[,i][j] ) + unlist( comb[,i][k] )) / 2 )^2))
              total = total + diff
          }
     }
     result_vector[[i]] <- total
}

So the question is how could I approach this problem to make this computation run faster. My approach was to parallize the outer most loop, where rangevariable is ~15000. All threads would need access to comb and share the variable result_vector. I believe my approach is not impossible but I would need some guidance.

Cole · Accepted Answer

This approach relies on creating a helper function and then doing the inner loop using the base combn() function.

fn_dist <- function(x, y){
  sum(((x - y) / ((x+y) / 2))^2)
}

system.time({
result_vector3 <- apply(comb, 2, function(comb_i) sum(combn(5, 2, FUN = function(x) fn_dist(comb_i[[x[1]]], comb_i[[x[2]]]))))
})

#   user  system elapsed 
#   1.12    0.00    1.15

The use of apply was intentional as future_apply is very easy to use. Unfortunately, it performs worse for my 2-core machine:

library(future.apply)

plan(multiprocess)

system.time({
  result_vector_future <- future_apply(comb, 2, function(comb_i) sum(combn(5, 2, FUN = function(x) fn_dist(comb_i[[x[1]]], comb_i[[x[2]]]))))
})

#   user  system elapsed 
#   1.59    0.03    1.92

If you prefer a for loop, these small changes make it similar in performance to the regure apply statement:

system.time({
for (i in seq(range)){
  total <- as.numeric(0)
  comb_i <- comb[, i]
  for ( j in seq(4))
  {
    for ( k in seq(j+1,5))
    {
      diff <- fn_dist(comb_i[[j]], comb_i[[k]])
      # diff <- sum( ( (unlist( comb[,i][j] ) - unlist( comb[,i][k] )) / (( unlist(comb[,i][j]) + unlist( comb[,i][k] ) ) / 2 ) )^2 )
      total = total + diff
    }
  }
  result_vector[[i]] <- total
}
})

#   user  system elapsed 
#   1.24    0.05    1.32

For reference, using @jogo's suggestion and removing just the mapply helps a lot but these workarounds help out a bit more.

system.time({
for (i in seq(range)){
  total <- as.numeric(0)
  # comb_i <- comb[, i]
  for ( j in seq(4))
  {
    for ( k in seq(j+1,5))
    {
      # diff <- fn_dist(comb_i[[j]], comb_i[[k]])
      diff <- sum( ( (unlist( comb[,i][j] ) - unlist( comb[,i][k] )) / (( unlist(comb[,i][j]) + unlist( comb[,i][k] ) ) / 2 ) )^2 )
      total = total + diff
    }
  }
  result_vector[[i]] <- total
}
})

#   user  system elapsed 
#   2.40    0.06    2.50

And finally, this is very similar to dist. If you are comfortable with the default methods, you could use:

system.time({
results_different_method <- apply(comb,2, function(l) sum(stats::dist(do.call(rbind,l))))
})

#   user  system elapsed 
#   0.70    0.00    0.74

library(proxy)

system.time({
result_same_as_OP <- apply(comb, 2, function (l) sum(proxy::dist(do.call(rbind, l), method = fn_dist)))
})

#   user  system elapsed 
#   1.58    0.05    1.67

And I tried to get it down to a one liner but it was slower:

system.time({
result_final <- combn(ncol(df2), 5, FUN = function(cols) sum(proxy::dist(t(df2[, cols]), method = fn_dist)))
}) 

   user  system elapsed 
   3.71    0.08    3.80

I'll organize these thoughts more later.

Parallelization/Optimization of R loops containing *apply

Answers (2)

Related Questions