landroni
landroni

Reputation: 2988

Why does this simple function calling `lm(..., subset)` fail?

I am working on a custom function that includes a call to lm(), but for some reason the function fails. I can't make any sense of why it fails.

Consider this example simplified to the bare-bones:

myfun <- function(form., data., subs., ...){
    lm(form., data., subs., ...)
}

This will end up in an error:

myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'subs.' not found

However using lm() directly will work just fine:

lm(mpg ~ cyl + hp, mtcars, TRUE)
## 
## Call:
## lm(formula = mpg ~ cyl + hp, data = mtcars, subset = TRUE)
## 
## Coefficients:
## (Intercept)          cyl           hp  
##    36.90833     -2.26469     -0.01912  

I tried debugging, but still can't get to the bottom of the problem. Why does the custom function fail? Clearly subs. has been supplied to the function...


Edit:

While most of the solutions suggested below help in this simple case, the function will still fail if I add a simple twist. For instance expand.model.frame() relies on the formula's environment, but fails if I use the normal evaluation solution:

myfun <- function(form., data., subs., ...){
    fit <- lm(form., data.[ subs., ], ...)
    expand.model.frame(fit, ~ drat)
}

myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Error in eval(expr, envir, enclos) : object 'data.' not found

This is obviously related to the original issue, but I can't figure how. Is the environment of the model formula somehow corrupted?

Upvotes: 6

Views: 1057

Answers (4)

user20637
user20637

Reputation: 674

Building on the answer of @ErnestA you can modify your function to ensure that subs. is present in the environment of formula form.:

myfun <- function(form., data., subs., ...){
assign("subs.", subs., envir=environment(form.))
lm(form., data., subs., ...)
}

ETA to avoid contaminating the environment of form you can create a new environment thus:

myfun <- function(form., data., subs., ...){
environment(form.) <- new.env(parent=environment(form.))
assign("subs.", subs., envir=environment(form.))
lm(form., data., subs., ...)
}

ETA perhaps the neatest way of fixing the lm issue alone is to set the environment of form. to that of myfun:

myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
lm(form., data., subs., ...)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## Call:
##   lm(formula = form., data = data., subset = subs.)
## 
## Coefficients:
##   (Intercept)          cyl           hp  
##      36.90833     -2.26469     -0.01912  

Turning to the expand.model.frame issue, subs. is not found although it's in the environment which ?expand.model.frame says is used. Is this a bug in expand.model.frame? or at least a conflict with the documentation?

myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
fit <- lm(form., data., subs., ...)
print(ls(environment(formula(fit))))
expand.model.frame(fit, ~drat )
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## [1] "data." "fit"   "form." "subs."
##  Error in eval(expr, envir, enclos) : object 'subs.' not found

Putting subs. into the parent environment seems to work.

myfun <- function(form., data., subs., ...){
environment(form.) <- environment()
fit <- lm(form., data., subs., ...)
assign("subs.", subs., envir = parent.env(environment(formula(fit))))
expand.model.frame(fit, ~drat)
}
myfun(mpg ~ cyl + hp, mtcars, TRUE)
## mpg cyl  hp drat
## Mazda RX4           21.0   6 110 3.90
## Mazda RX4 Wag       21.0   6 110 3.90
## Datsun 710          22.8   4  93 3.85
## Hornet 4 Drive      21.4   6 110 3.08
## etc.

But this has the issues of contaminating the parent environment, in this case R_GlobalEnv. I haven't been able to make it work using anything other than R_GlobalEnv as the parent.

Upvotes: 3

landroni
landroni

Reputation: 2988

As suggested in the comments, another solution would be to avoid the subset argument altogether in non-interactive use, and use standard evaluation instead:

myfun <- function(form., data., subs., ...){
    lm(form., data.[ subs., ], ...)
}

Now this works as expected:

myfun(formula(mpg ~ cyl + hp), mtcars, TRUE)

However this won't still be enough if your custom function subsequently contains calls like expand.model.frame() or similar, which seem to be themselves sensitive to the non-standard evaluation of the subset argument. To make the function robust and avoid surprises, you need to both (1) define the formula within the custom function (see also the reformulate approach) and (2) subset the data prior to the lm() call while conspicuously avoiding the subset argument.

Like this:

myfun <- function(form., data., subs., ...){
    stopifnot(is.character(form.))
    data. <- data.[ subs., ]
    fit <- lm(as.formula(form.), data., ...)
    expand.model.frame(fit, ~ drat)
}

myfun("mpg ~ cyl + hp", mtcars, TRUE)

I tried using either (1) or (2), but still managed to run into strange errors from some functions, and it's only with both (1) and (2) that the errors seem to have gone away...

Upvotes: 6

akuiper
akuiper

Reputation: 215087

You can do something like this:

myfun <- function(form., data., subs., ...){
    lm(as.formula(form.), data., subs., ...)
}

Call it as myfun("mpg ~ cyl + hp", mtcars, T). This forces the formula to be created in the environment of the function myfun which will then contain subs..

Upvotes: 3

Ernest A
Ernest A

Reputation: 7839

The reason this function doesn't work is because of the way the argument subset is evaluated:

All of ‘weights’, ‘subset’ and ‘offset’ are evaluated in the same way as variables in ‘formula’, that is first in ‘data’ and then in the environment of ‘formula’.

In other words, lm looks for a variable named subs. in data and then in the environment of formula, and since there is no subs. variable in either of those environments it produces an error.

Upvotes: 4

Related Questions