Reputation: 279
See the edit at the end for a reproducible example.
Problem description
When I run boot::censboot(data, statistic, parallel = "multicore", ncpus = 2, var = whatEver)
, where I've defined statistic <- function(data, var)
, I get error messages of type FUN(X[[i]], ...) : unused argument (var = whatEver)
. The issue is that statistic
is not able to see the value of var
.
This does not happen when I call boot::censboot(data, statistic, parallel = "no")
.
By debugging I can see that:
If parallel = "no"
, boot::censboot
is running something like this:
stat <- function(r, s){r + s}
main <- function(...)
{
fn <- {function(r) stat(r, ...)}
lapply(1:2, fn)
}
main(s = 2)
Output:
[[1]]
[1] 3
[[2]]
[1] 4
In this case stat
is indeed able to see that s = 1
, even though fn
is only a function of r
(and not r
AND ...
).
But if parallel = "multicore", ncpus = 2
, then boot::censboot
runs something like this (note that the only difference to the above code block is ...
in lapply
):
stat <- function(r, s){r + s}
main <- function(...)
{
fn <- {function(r) stat(r, ...)}
lapply(1:2, fn, ...)
}
main(s = 2)
Output:
Error in FUN(X[[i]], ...) : unused argument (s = 2)
In this case stat
is NOT able to see that s = 1
. This is the root cause of error messages of type unused argument.
(Ofcourse, in reality boot::censboot
calls parallel::mclapply
rather than lapply
to parallelise, but the issue pertains to the use of ...
. My understanding is that ...
means the exact same thing in lapply
as in parallel::mclapply
, since I'm able to reproduce the error message from boot::censboot
in the repexes above).
Questions:
stat
able to see that s = 1
in the ordinary case, where lapply
doesn't actually pass along arguments using ...
? And why is this not true anymore when lapply
does use ...
in the parallel case?main
, which represents code defined in boot::censboot
. How can I change stat
so that it works in both cases?Edit: added reproducible example
As requested by a commenter below, here is an example that reproduces the error in the parallel case. If you set parallel = "no", ncpus = 1
in boot::censboot
the code works as you would expect.
library(boot)
library(survival)
data(aml, package = "boot")
statMeanSurv <- function(data, var) {
surv <- survfit(Surv(time, cens) ~ 1, data = data)
mean(surv$surv) + var
}
res <- censboot(aml, statMeanSurv, R = 5,
var = 1, parallel = "multicore", ncpus = 2)
res$t
Output:
> res <- censboot(aml, statMeanSurv, R = 5,
+ var = 1, parallel = "multicore", ncpus = 2)
Warning message:
In parallel::mclapply(seq_len(R), fn, ..., mc.cores = ncpus) :
all scheduled cores encountered errors in user code
>
> res$t
[,1]
[1,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[2,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[3,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[4,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[5,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
Upvotes: 0
Views: 137
Reputation: 44877
This is a rewrite of the original post, that gives a better explanation of what went wrong, and fixes a possible bug in the workaround.
That looks like a bug in censboot
. It doesn't handle the ...
parameter correctly. (More explanation below.) The reason you don't get an error with parallel = 'no'
is that the code follows a different path.
A workaround is to use "partial application" to create a 1-parameter statistic function, like this:
library(boot)
library(survival)
#>
#> Attaching package: 'survival'
#> The following object is masked from 'package:boot':
#>
#> aml
data(aml, package = "boot")
statMeanSurv <- function(data, var) {
surv <- survfit(Surv(time, cens) ~ 1, data = data)
mean(surv$surv) + var
}
statMeanSurv1 <- function(var) {
force(var) # Fix the value of var
function(mean) statMeanSurv(mean, var)
}
res <- censboot(aml, statMeanSurv1(var = 1), R = 5,
parallel = "multicore", ncpus = 2)
res$t
#> [,1]
#> [1,] 1.564580
#> [2,] 1.503473
#> [3,] 1.602111
#> [4,] 1.440942
#> [5,] 1.594482
Created on 2021-02-04 by the reprex package (v0.3.0)
Internally, the problem in censboot
is that it does something like my workaround, but then it also passes ...
to its equivalent of statMeanSurv1
, and that's an error: it can only accept 1 argument.
The line force(var)
in statMeanSurv1
isn't necessary in the example, but in more elaborate examples it might be. It guarantees that the newly created function uses the specified value.
Upvotes: 1