gccd
gccd

Reputation: 49

R parallel processing with nested loops

I am trying to increase the execution speed of the code below, but I am not sure whether to parallelize the outermost loop only or the outer loop and inner loops. I am working on Ubuntu with 2 processors and I do not know how many threads each processor would create to conduct this task and whether the spanning of many threads would bring any complications that I should be aware of and control with locks. What would you recommend?

ibrary(foreach)
library(doParallel)

nc = detectCores()
cl = makeCluster(nc, typr = “FORK”)
registerDoParallel(cl)

pts <- list(chunkSize=2)

    foreach (H in 0:HexC, .combine = “c”) %:%{
        foreach (HN in 0:HNcC,  .Combine = “c”) %dopar%{
            foreach (F in 0:FucC, .Combine = “c” ) %dopar%{
                foreach (SA in 0:SAC, .Combine = “c”) %dopar%
                    foreach (SO3 in 0:SO3C,{
                        NAmax<- sum(SA+SO3)
                        foreach (NAD in 0:NAmax, .combine = “c”) %dopar%{
                            Na_Cnt<- c(Na_Cnt, NAD)
                            SO3_Cnt<- c(SO3_Cnt, SO3)
                            SA_Cnt<- c(SA_Cnt, SA)
                            Fuc_Cnt<- c(Fuc_Cnt, F)
                            HexNAc_Cnt<- c(HexNAc_Cnt, HN)
                            Hex_Cnt<- c(Hex_Cnt, H)

                            Na_Mass<- c(Na_Mass, NAD*NaAdductMass)
                            SO3_Mass<- c(SO3_Mass, SO3*dels["SO3"])
                            SA_Mass<- c(SA_Mass, SA*dels["SA"])
                            Fuc_Mass<- c(Fuc_Mass, F*dels["Fuc"])
                            HexNAc_Mass<- c(HexNAc_Mass, HN*dels["HexNAc"])
                            Hex_Mass<- c(Hex_Mass, H*dels["Hex"])
                        }
                    }
                }
            }
        }
    }

stopImplicitCluster()
stopCluster(cl)

Upvotes: 0

Views: 774

Answers (1)

Katia
Katia

Reputation: 3914

There are multiple problems with your code. You code calculate the output variable based on the value of that variable in the previous iteration, for example:

Na_Mass<- c(Na_Mass, NAD*NaAdductMass)

Instead you need to do something like this:

Na_Mass <- foreach (NAD in 0:NAmax, .combine = “c”) %dopar%{
   return(NAD*NaAdductMas)

}

For more examples see the documentation of doParallel packages: https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf

As for the number of CPU cores your program will use, it will be equal to HNcC * FucC * SAC * NAmax, which is probably a very large number and with your computer having only 2 processors you will run in danger of burning it. And at the same time each parallel process in R will not have enough CPU resources and will run significantly slower. I would parallelize no more than one loop.

One more note: This approach to calculate Na_cnt, and other objects in your loop is VERY slow with or without parallelization:

Na_Cnt<- c(Na_Cnt, NAD)

Instead you should vectorize:

Na_Cnt <- 0:NAmax

Similarly:

  SO3_Cnt<- rep( SO3, NAMax+1)
  SA_Cnt<- rep( SA, NAMax+1)
  Fuc_Cnt <- rep(F, NAMax+1)
  HexNAc_Cnt <- rep( HN, NAMax+1)
  Hex_Cnt <- rep(H,NAMax+1 )

Similar with all other statements in your innermost loop. This will be significantly faster and you will not need any parallelization.

Upvotes: 3

Related Questions