Reputation: 2988
I am working on a custom function that includes a call to lm()
, but for some reason the function fails. I can't make any sense of why it fails.
Consider this example simplified to the bare-bones:
myfun <- function(form., data., subs., ...){
lm(form., data., subs., ...)
}
This will end up in an error:
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'subs.' not found
However using lm()
directly will work just fine:
lm(mpg ~ cyl + hp, mtcars, TRUE)
##
## Call:
## lm(formula = mpg ~ cyl + hp, data = mtcars, subset = TRUE)
##
## Coefficients:
## (Intercept) cyl hp
## 36.90833 -2.26469 -0.01912
I tried debugging, but still can't get to the bottom of the problem. Why does the custom function fail? Clearly subs.
has been supplied to the function...
While most of the solutions suggested below help in this simple case, the function will still fail if I add a simple twist. For instance expand.model.frame()
relies on the formula's environment, but fails if I use the normal evaluation solution:
myfun <- function(form., data., subs., ...){
fit <- lm(form., data.[ subs., ], ...)
expand.model.frame(fit, ~ drat)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'data.' not found
This is obviously related to the original issue, but I can't figure how. Is the environment of the model formula somehow corrupted?
Upvotes: 6
Views: 1057
Reputation: 674
Building on the answer of @ErnestA you can modify your function to ensure that subs.
is present in the environment of formula form.
:
myfun <- function(form., data., subs., ...){
assign("subs.", subs., envir=environment(form.))
lm(form., data., subs., ...)
}
ETA to avoid contaminating the environment of form
you can create a new environment thus:
myfun <- function(form., data., subs., ...){
environment(form.) <- new.env(parent=environment(form.))
assign("subs.", subs., envir=environment(form.))
lm(form., data., subs., ...)
}
ETA perhaps the neatest way of fixing the lm issue alone is to set the environment of form.
to that of myfun
:
myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
lm(form., data., subs., ...)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Call:
## lm(formula = form., data = data., subset = subs.)
##
## Coefficients:
## (Intercept) cyl hp
## 36.90833 -2.26469 -0.01912
Turning to the expand.model.frame
issue, subs.
is not found although it's in the environment which ?expand.model.frame
says is used. Is this a bug in expand.model.frame? or at least a conflict with the documentation?
myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
fit <- lm(form., data., subs., ...)
print(ls(environment(formula(fit))))
expand.model.frame(fit, ~drat )
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## [1] "data." "fit" "form." "subs."
## Error in eval(expr, envir, enclos) : object 'subs.' not found
Putting subs.
into the parent environment seems to work.
myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
fit <- lm(form., data., subs., ...)
assign("subs.", subs., envir = parent.env(environment(formula(fit))))
expand.model.frame(fit, ~drat)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## mpg cyl hp drat
## Mazda RX4 21.0 6 110 3.90
## Mazda RX4 Wag 21.0 6 110 3.90
## Datsun 710 22.8 4 93 3.85
## Hornet 4 Drive 21.4 6 110 3.08
## etc.
But this has the issues of contaminating the parent environment, in this case R_GlobalEnv
. I haven't been able to make it work using anything other than R_GlobalEnv
as the parent.
Upvotes: 3
Reputation: 2988
As suggested in the comments, another solution would be to avoid the subset
argument altogether in non-interactive use, and use standard evaluation instead:
myfun <- function(form., data., subs., ...){
lm(form., data.[ subs., ], ...)
}
Now this works as expected:
myfun(formula(mpg ~ cyl + hp), mtcars, TRUE)
However this won't still be enough if your custom function subsequently contains calls like expand.model.frame()
or similar, which seem to be themselves sensitive to the non-standard evaluation of the subset
argument. To make the function robust and avoid surprises, you need to both (1) define the formula within the custom function (see also the reformulate
approach) and (2) subset the data prior to the lm()
call while conspicuously avoiding the subset
argument.
Like this:
myfun <- function(form., data., subs., ...){
stopifnot(is.character(form.))
data. <- data.[ subs., ]
fit <- lm(as.formula(form.), data., ...)
expand.model.frame(fit, ~ drat)
}
myfun("mpg ~ cyl + hp", mtcars, TRUE)
I tried using either (1) or (2), but still managed to run into strange errors from some functions, and it's only with both (1) and (2) that the errors seem to have gone away...
Upvotes: 6
Reputation: 215087
You can do something like this:
myfun <- function(form., data., subs., ...){
lm(as.formula(form.), data., subs., ...)
}
Call it as myfun("mpg ~ cyl + hp", mtcars, T)
. This forces the formula to be created in the environment of the function myfun
which will then contain subs.
.
Upvotes: 3
Reputation: 7839
The reason this function doesn't work is because of the way the argument subset
is evaluated:
All of ‘weights’, ‘subset’ and ‘offset’ are evaluated in the same way as variables in ‘formula’, that is first in ‘data’ and then in the environment of ‘formula’.
In other words, lm
looks for a variable named subs.
in data
and then in the environment of formula
, and since there is no subs.
variable in either of those environments it produces an error.
Upvotes: 4