Reputation: 1772
I have a function that contains a loop
myfun = function(z1.d, r, rs){
x = z1.d[,r]
or.d = order(as.vector(x), decreasing=TRUE)[rs]
zz1.d = as.vector(x)
r.l = zz1.d[or.d]
y=vector()
for (i in 1:9)
{
if(i<9) y[i]=mean( x[(x[,r] >= r.l[i] & x[,r] < r.l[i+1]),r] ) else{
y[i] = mean( z1.d[(x >= r.l[9]),r] )}
}
return(y)
}
rs is a numeric vector, z1.d is a zoo and y is also a numeric vector.
When I try to run the function inside a parallel loop:
cls = makePSOCKcluster(8)
registerDoParallel(cls)
rlarger.d.1 = foreach(r=1:dim(z1.d)[2], .combine = "cbind") %dopar% {
myfun(z1.d, r, rs)}
stopCluster(cls)
I get the following error:
Error in { : task 1 failed - "incorrect number of dimensions"
I don't know why, but I realized if I take the loop out of my function it does not give an error.
Also, if I run the exact same code with %do% instead of %dopar% (so not runing in parallel) it works fine (slow but without errors).
EDIT: as requested here is a sample of the parameters:
dim(z1.d)
[1] 8766 107
> z1.d[1:4,1:6]
AU_10092 AU_10622 AU_12038 AU_12046 AU_13017 AU_14015
1966-01-01 23:00:00 NA NA NA 1.816 0 4.573
1966-01-02 23:00:00 NA NA NA 9.614 0 4.064
1966-01-03 23:00:00 0 NA NA 0.000 0 0.000
1966-01-04 23:00:00 0 NA NA 0.000 0 0.000
> rs
[1] 300 250 200 150 100 75 50 30 10
r is defined in the foreach loop
Upvotes: 8
Views: 1934
Reputation: 1711
The error pops up because you failed to initiate zoo
on your workers. Thus the workers don't know how to deal with zoo objects properly, instead they handle them as matrizes which don't behave the same way when subsetting!
So the quick fix to your stated problem would be to add.packages="zoo"
to your foreach
call.
In my opinion you don't even need to do parallel computations. You can enhance your function dramatically if you use numeric vectors instead of zoo-objects:
# sample time series to match your object's size
set.seed(1234)
z.test <- as.zoo(replicate(107,sample(c(NA,runif(1000,0,10)),size = 8766, replace = TRUE)))
myfun_new <- function(z, r, rs){
x <- as.numeric(z[,r])
r.l <- x[order(x, decreasing=TRUE)[rs]]
res_dim <- length(rs)
y=numeric(res_dim)
for (i in 1:res_dim){
if(i< res_dim){
y[i] <- mean( x[(x >= r.l[i] & x < r.l[i+1])], na.rm = TRUE )
}else{
y[i] <- mean( x[(x >= r.l[res_dim])] , na.rm = TRUE)
}
}
return(y)
}
Simple timings show the improvement:
system.time({
cls = makePSOCKcluster(4)
registerDoParallel(cls)
rlarger.d.1 = foreach(r=1:dim(z.test)[2],.packages = "zoo", .combine = "cbind") %dopar% {
myfun(z.test, r, rs)}
stopCluster(cls)
})
## User System verstrichen
## 0.08 0.10 10.93
system.time({
res <-sapply(1:dim(z.test)[2], function(r){myfun_new(z.test, r, rs)})
})
## User System verstrichen
## 0.48 0.21 0.68
While the results are the same (only column names differ)
all.equal(res, rlarger.d.1, check.attributes = FALSE)
## [1] TRUE
Upvotes: 2
Reputation: 2588
It sims like there is an error in your function code.
In line 2 you create a 1-dimensional object
x = z1.d[,r]
In line 9 you treat it like 2-dimensional one
x[some_logic, r]
That is why you have "incorrect number of dimensions" error. Although, I do not know why it works in %do% variant.
In any case you need to replace code inside for
loop with:
if(i<9) y[i]=mean( x[(x[,r] >= r.l[i] & x[,r] < r.l[i+1])] ) else{
y[i] = mean( x[(x >= r.l[9])] )}
Or with:
if(i<9) y[i]=mean( z1.d[(x[,r] >= r.l[i] & x[,r] < r.l[i+1]),r] ) else{
y[i] = mean( z1.d[(x >= r.l[9]),r] )}
As you did not provide reproducible example, I did not test it.
Upvotes: 1