harisf
harisf

Reputation: 279

lapply - The difference between passing and not passing arguments when named argument exists in the environment

See the edit at the end for a reproducible example.

Problem description

When I run boot::censboot(data, statistic, parallel = "multicore", ncpus = 2, var = whatEver), where I've defined statistic <- function(data, var), I get error messages of type FUN(X[[i]], ...) : unused argument (var = whatEver). The issue is that statistic is not able to see the value of var.

This does not happen when I call boot::censboot(data, statistic, parallel = "no").

By debugging I can see that:

Questions:

  1. Why does this happen? How is stat able to see that s = 1 in the ordinary case, where lapply doesn't actually pass along arguments using ...? And why is this not true anymore when lapply does use ... in the parallel case?
  2. I cannot change the internals of main, which represents code defined in boot::censboot. How can I change stat so that it works in both cases?

Edit: added reproducible example

As requested by a commenter below, here is an example that reproduces the error in the parallel case. If you set parallel = "no", ncpus = 1 in boot::censboot the code works as you would expect.

library(boot)
library(survival)
data(aml, package = "boot") 

statMeanSurv <- function(data, var) {
  surv <- survfit(Surv(time, cens) ~ 1, data = data)
  mean(surv$surv) + var
}

res <- censboot(aml, statMeanSurv, R = 5,
                var = 1, parallel = "multicore", ncpus = 2)

res$t

Output:

> res <- censboot(aml, statMeanSurv, R = 5,
+                 var = 1, parallel = "multicore", ncpus = 2)
Warning message:
In parallel::mclapply(seq_len(R), fn, ..., mc.cores = ncpus) :
  all scheduled cores encountered errors in user code
> 
> res$t
     [,1]                                                     
[1,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[2,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[3,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[4,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"
[5,] "Error in FUN(X[[i]], ...) : unused argument (var = 1)\n"

Upvotes: 0

Views: 137

Answers (1)

user2554330
user2554330

Reputation: 44877

This is a rewrite of the original post, that gives a better explanation of what went wrong, and fixes a possible bug in the workaround.

That looks like a bug in censboot. It doesn't handle the ... parameter correctly. (More explanation below.) The reason you don't get an error with parallel = 'no' is that the code follows a different path.

A workaround is to use "partial application" to create a 1-parameter statistic function, like this:

library(boot)
library(survival)
#> 
#> Attaching package: 'survival'
#> The following object is masked from 'package:boot':
#> 
#>     aml
data(aml, package = "boot") 

statMeanSurv <- function(data, var) {
  surv <- survfit(Surv(time, cens) ~ 1, data = data)
  mean(surv$surv) + var
}

statMeanSurv1 <- function(var) { 
  force(var)   # Fix the value of var
  function(mean) statMeanSurv(mean, var) 
}

res <- censboot(aml, statMeanSurv1(var = 1), R = 5,
                parallel = "multicore", ncpus = 2)

res$t
#>          [,1]
#> [1,] 1.564580
#> [2,] 1.503473
#> [3,] 1.602111
#> [4,] 1.440942
#> [5,] 1.594482

Created on 2021-02-04 by the reprex package (v0.3.0)

Internally, the problem in censboot is that it does something like my workaround, but then it also passes ... to its equivalent of statMeanSurv1, and that's an error: it can only accept 1 argument.

The line force(var) in statMeanSurv1 isn't necessary in the example, but in more elaborate examples it might be. It guarantees that the newly created function uses the specified value.

Upvotes: 1

Related Questions