Reputation: 2505
I encountered strange behavior when using lapply to bootstrap a GLM. Each iteration of the lapply uses a different weight, but the formula variable is the same. Thus, the latter was kept outside the anonymous function.
Below is a reproducible toy example.
The following code runs as expected:
library(dplyr)
data_adult <-read.csv("https://raw.githubusercontent.com/guru99-edu/R-Programming/master/adult.csv")
data_adult$Y <- (data_adult$hours.per.week > 40)
est_boot <- lapply(1:10, function(bb){
ff <- as.formula('Y ~ gender')
w <- rexp( nrow(data_adult), 1)
glmout <- glm( ff, 'quasibinomial', data_adult, w )
return(coef(glmout))
})
Whereas the following does not:
ff <- as.formula('Y ~ gender')
est_boot <- lapply(1:10, function(bb){
w <- rexp( nrow(data_adult), 1)
glmout <- glm( ff, 'quasibinomial', data_adult, w )
return(coef(glmout))
})
Error in eval(extras, data, env) : object 'w' not found
I thought maybe the function needs all the arguments to defined locally. However, data_adult is not. Why is w
not recognized when ff
is defined outside the function?
I am using R 4.3.0.
Upvotes: 1
Views: 25
Reputation: 173793
In R, a formula has an attribute called .Environment
which you can see in your second version by calling
attributes(ff)
#> $class
#> [1] "formula"
#>
#> $.Environment
#> <environment: R_GlobalEnv>
When a formula is parsed, the .Environment
attribute is used as a starting point on the search path to find the variables it references. The formula cannot find w
because it does not exist in the global environment. You can get round this by assigning the local environment to the .Environment
attribute inside lapply
ff <- as.formula('Y ~ gender')
est_boot <- lapply(1:10, function(bb){
w <- rexp( nrow(data_adult), 1)
environment(ff) <- environment()
glmout <- glm(ff, 'quasibinomial', data_adult, w )
return(coef(glmout))
})
Upvotes: 2