Aravind Raj Reddy
Aravind Raj Reddy

Reputation: 59

Parallel processing using R parapply

I have a have r code which can be simplified into simple version as shown below.

cl <- parallel::makeCluster(2, type="SOCK")
b<-data.frame(c(1,1,2,2,3,3,4,4,7,7,9,9,11,11,12,12,13,13,14,14))
colnames(b)<-c("col1")
b_uni<-unique(b)
clusterExport(cl,"b_uni")

bbb <- parallel::parLapply(cl,1:nrow(b_uni), fun=function(i,b) {
e<-b[b$col2==b_uni[i,1],]
a<-e+10
return(a)
}b=b)

c <- na.omit(do.call(rbind, bbb))

In order to minimize number of loops, i am running only unique combinations in in b. But the variable bbb and c are not getting populated.

Upvotes: 1

Views: 278

Answers (1)

clemens
clemens

Reputation: 6813

You haven't passed the object b to your parLapply(). In lapply you can access objects in the global environment, in parLapply() you have to pass them. So if you change your code to this:

bbb <- parallel::parLapply(cl,1:nrow(b_uni), fun=function(i,b) {
  e<-b[b$col2==b_uni[i,1],]
  a<-e+10
  return(a)
}, b = b)

it will work.

EDIT: The reason bbb is empty is because b does not have a column called col2.

bbb <- parallel::parLapply(cl,1:nrow(b_uni), fun=function(i,b) {
  e<-b[b$col1==b_uni[i,1],]
  a<-e+10
  return(a)
}, b = b)

If you change it to col1 it will a list of vectors of length 2:

lengths(bbb)
[1] 2 2 2 2 2 2 2 2 2 2

Upvotes: 3

Related Questions