PingPong
PingPong

Reputation: 975

Strange glm() behavior in a function

Please help me understand the re-producible example below. I am trying to write a function glm_func() that would call glm(). It works perfectly fine outside of a function. However, if I pass the linear model formula as an argument, the function glm_func() gives out a strange error:

Error in eval(extras, data, env) : object 'modeldata' not found

Can someone help me understand what went wrong?

# Fully reproducable example
# Specify data
aa = data.frame(y=1:100, x1=1:100, x2=rep(1, 100), z=runif(100))
lm_formula = as.formula('y ~ x1 + x2')
weight_var = 'z'

# GLM works as-is outside of a function
model1 = glm(formula = lm_formula, data = aa, weights = aa[[weight_var]])

# Why does this function not work?
glm_func <- function(modeldata, formula, weight){
  thismodel=glm(
    formula = formula, #<----- Does not work if formula is passed from argument
    data = modeldata,  weights = modeldata[[weight]])}
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)

# This function works
glm_func2 <- function(modeldata, weight){
  thismodel=glm(
    formula = y ~ x1 + x2, #<----- Works if formula is hardcoded
    data = modeldata,  weights = modeldata[[weight]])}
glm_func2(modeldata=aa, weight=weight_var)

Upvotes: 1

Views: 97

Answers (2)

Roland
Roland

Reputation: 132676

From help("formula"):

A formula object has an associated environment, and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.

Formulas created with the ~ operator use the environment in which they were created. Formulas created with as.formula will use the env argument for their environment.

From this one would expect that you don't need to care about the environment if you use the data argument. Sadly that's not the case here because the weights are evaluated within the formula's environment (Thanks to useruser2554330 for pointing this out!).

So, you need to ensure that your function environment is associated with the formula:

glm_func <- function(modeldata, formula, weight){
  environment(formula) <- environment()
  glm(formula = formula, data = modeldata,  
      weights = modeldata[[weight]])
  }
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)
#works

Personally, I'd do this instead:

glm_func <- function(modeldata, formula, weight){
  environment(formula) <- environment()
  eval(
    bquote(
      glm(formula = .(formula), data = modeldata,  
          weights = modeldata[[weight]])
    )
  )
}

This way, the actual formula is printed when you print the model object.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388862

As @Roland commented that a formula object has an associated environment so instead of passing a formula object you can pass the variables and create the formula inside the function.

glm_func <- function(modeldata, resp, predictor, weight){
  glm(formula = reformulate(predictor, resp), 
    data = modeldata,  weights = modeldata[[weight]])
}

glm_func(modeldata=aa, 'y', c('x1', 'x2'), weight=weight_var)

Upvotes: 0

Related Questions