sbg
sbg

Reputation: 1772

loop inside a foreach loop using doparallel

I have a function that contains a loop

myfun = function(z1.d, r, rs){
  x = z1.d[,r]
  or.d = order(as.vector(x), decreasing=TRUE)[rs]
  zz1.d = as.vector(x)
  r.l = zz1.d[or.d]

  y=vector()
  for (i in 1:9)
  {
    if(i<9) y[i]=mean( x[(x[,r] >= r.l[i] & x[,r] < r.l[i+1]),r] ) else{
      y[i] =  mean( z1.d[(x >= r.l[9]),r] )}
  }
  return(y)
}

rs is a numeric vector, z1.d is a zoo and y is also a numeric vector.

When I try to run the function inside a parallel loop:

cls = makePSOCKcluster(8)
registerDoParallel(cls)

rlarger.d.1  = foreach(r=1:dim(z1.d)[2], .combine = "cbind") %dopar% {    
  myfun(z1.d, r, rs)}

stopCluster(cls)

I get the following error:

Error in { : task 1 failed - "incorrect number of dimensions"

I don't know why, but I realized if I take the loop out of my function it does not give an error.

Also, if I run the exact same code with %do% instead of %dopar% (so not runing in parallel) it works fine (slow but without errors).

EDIT: as requested here is a sample of the parameters:

dim(z1.d)
[1] 8766  107
> z1.d[1:4,1:6]
                    AU_10092 AU_10622 AU_12038 AU_12046 AU_13017 AU_14015
1966-01-01 23:00:00       NA       NA       NA    1.816        0    4.573
1966-01-02 23:00:00       NA       NA       NA    9.614        0    4.064
1966-01-03 23:00:00        0       NA       NA    0.000        0    0.000
1966-01-04 23:00:00        0       NA       NA    0.000        0    0.000

> rs
[1] 300 250 200 150 100  75  50  30  10

r is defined in the foreach loop

Upvotes: 8

Views: 1934

Answers (2)

wici
wici

Reputation: 1711

The error pops up because you failed to initiate zoo on your workers. Thus the workers don't know how to deal with zoo objects properly, instead they handle them as matrizes which don't behave the same way when subsetting! So the quick fix to your stated problem would be to add.packages="zoo" to your foreach call.

In my opinion you don't even need to do parallel computations. You can enhance your function dramatically if you use numeric vectors instead of zoo-objects:

# sample time series to match your object's size
set.seed(1234)
z.test <- as.zoo(replicate(107,sample(c(NA,runif(1000,0,10)),size = 8766, replace = TRUE)))

myfun_new <-  function(z, r, rs){
  x <-  as.numeric(z[,r])
  r.l <- x[order(x, decreasing=TRUE)[rs]]
  res_dim <- length(rs)
  y=numeric(res_dim)
  for (i in 1:res_dim){
    if(i< res_dim){ 
      y[i] <- mean( x[(x >= r.l[i] & x < r.l[i+1])], na.rm = TRUE ) 
    }else{
      y[i] <-   mean( x[(x >= r.l[res_dim])] , na.rm = TRUE)
    }
  }
  return(y)
}

Simple timings show the improvement:

system.time({
  cls = makePSOCKcluster(4)
  registerDoParallel(cls)
  rlarger.d.1 = foreach(r=1:dim(z.test)[2],.packages = "zoo", .combine = "cbind") %dopar% { 
    myfun(z.test, r, rs)}
  stopCluster(cls)
})
##  User      System verstrichen 
##  0.08        0.10       10.93
system.time({
  res <-sapply(1:dim(z.test)[2], function(r){myfun_new(z.test, r, rs)})
})
##  User      System verstrichen 
##  0.48        0.21        0.68

While the results are the same (only column names differ)

all.equal(res, rlarger.d.1, check.attributes = FALSE)
## [1] TRUE

Upvotes: 2

Istrel
Istrel

Reputation: 2588

It sims like there is an error in your function code.

In line 2 you create a 1-dimensional object

x = z1.d[,r]

In line 9 you treat it like 2-dimensional one

x[some_logic, r]

That is why you have "incorrect number of dimensions" error. Although, I do not know why it works in %do% variant.

In any case you need to replace code inside for loop with:

if(i<9) y[i]=mean( x[(x[,r] >= r.l[i] & x[,r] < r.l[i+1])] ) else{
      y[i] =  mean( x[(x >= r.l[9])] )}

Or with:

if(i<9) y[i]=mean( z1.d[(x[,r] >= r.l[i] & x[,r] < r.l[i+1]),r] ) else{
      y[i] =  mean( z1.d[(x >= r.l[9]),r] )}

As you did not provide reproducible example, I did not test it.

Upvotes: 1

Related Questions