rewove
rewove

Reputation: 79

R Parallel Programming with Two Loops and Storage Results

I have a function with two loops involved and the results is two lists of data.

The structure

function (){
for (i in 1:50){
    for (j in 1:100){
        "Do something"
        "get results a and b"
        a list
        b list
    }
    "use the series of a and b calculate two parameter A and B"
    "put A and B into their list"
    list A = append(list A, A)
'or'list B = cbind(list B, B)   # I don't know which one is better
}
plot the figure using list A and B

"saving the results"
dataframe = df(listA, listB)
dataframe to csv
}

The code needs simulate 5000 times and each step takes at least 1 minutes:

  1. I want to run this whole function using parallel programming; I tried lapply but it only works well with one loop, if I do so the results is not consistent and the plot can not work, i.e. I cannot get the results;

and I find some parallel code can not work on Windows and some cannot work on Mac, I am confused with those ...

Each steps in the loop is independently so one alternative way I thought is just divide the jobs to do them simultaneously, but I need the results constantly (as the order they should be).

  1. To using the data in further plot requirements I need to save the results, I feel trouble with this one(upper here) and also the parallel one;

The way I save the results is looks like a mess. For example, what I want is:

A    B
0    0
0.1  1
1.2  4
3    9
6    12
...  ...

but what I got is:

    V1
0    0   0.1  1  1.2  4   3    9  6    12  ... ...

I don't know how to save two columns data from parallel programming.

Upvotes: 1

Views: 435

Answers (1)

mischva11
mischva11

Reputation: 2956

I like using the foreach package for tasks like this (check the documentation). This function is like a for loop, but it works on a cluster. So each for iteration is done separately and is combined afterwards. I made a small example with the structure you are using. You can modify this for your task.

library(foreach)
library(doParallel)
#number of your cluster precessors, i choosed 4
cl <- makeCluster(4)
registerDoParallel(cl)
# use for z=1:10 your range, the .combine declares how to combine your dataframe afterwrads,
#.inorder makes sure it's sorted and the values are in the right order (TRUE is default)
df<-foreach(z = 1:10, .combine=rbind, .inorder=TRUE) %dopar%{
    list_b = list()
    list_a = list()
    for (i in 1:50){
      for (j in 1:100){
        #some random task you are doing
        a = i 
        b = 50-i
      }
      #combining them
      list_b= cbind(list_b, b)
      list_a= cbind(list_a, a)
    }
    #make sure you return the values, otherwise they don't get combined by foreach
    return(do.call(rbind, Map(data.frame, A=list_a, B=list_b)))
}
#foreach returns nested lists, so you can change it to a dataframe easily
df= as.data.frame(df)
View(df)

stopCluster(cl)

Upvotes: 2

Related Questions