How to speed up parallel foreach in R

Question

I want to calculate a series of approx 1.000.000 wilcox.tests in R:

result <- foreach(i = 1:ncol(data), .combine=bind_rows, .multicombine= TRUE, .maxcombine = 1000  ) %do% { 

w = wilcox.test(data[,i]~as.factor(groups),exact = FALSE)

df <- data.frame(Characters=character(),
                   Doubles=double(),
                   Doubles=double(),
                   stringsAsFactors=FALSE)

  df[1,] = c(colnames(data)[i], w$statistic, w$p.value)

  rownames(df) = colnames(beta_t1)[i]
  colnames(df) = c("cg", "statistic", "p.value")

  return(df)

}

If I do it with %dopar% and 15 cores it is slower than with single core %do%. I suspect it is a memory access problem. My processors are hardly used to capacity either. Is it possible to split the data dataframe into chunks and then have each processor calculate 100K and then add them together? How can I speed up this foreach loop?

How to speed up parallel foreach in R

Answers (1)

Related Questions