SimRock
SimRock

Reputation: 239

calling the glm() function within a user-defined function

I have been trying to create a function that uses a glm() inside it. But I always get an error message. It looks like the function does not retrieve the value of the variable.

set.seed(234)
sex <- sample(c("M", "F"), size=100, replace=TRUE)
age <- rnorm(n=100, mean=20 + 4*(sex=="F"), sd=0.1)
dsn <- data.frame(sex, age)
rm(sex, age) #remove sex and age from the global environment for reproducibility

to_analyze <- function(dep, indep, data){
  glm(dep~factor(indep), data=data)
}

to_analyze(dep=age, indep=sex, data=dsn)
#> Error in eval(predvars, data, env): object 'age' not found


Upvotes: 2

Views: 1044

Answers (3)

SimRock
SimRock

Reputation: 239

@Onyambu and others. The substitute command seems to work well for just one call as it works for the to_analyze(). However when I call another function inside it, it is complaining again. Any help would be greatly appreciated

to_analyze <- function(dep, indep, data){
  glm(substitute(dep ~ factor(indep)), data=data)
}

to_analyze(dep=age, indep=sex, data=dsn)
#> 
#> Call:  glm(formula = substitute(dep ~ factor(indep)), data = data)
#> 
#> Coefficients:
#>  (Intercept)  factor(sex)M  
#>       24.006        -4.034  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
#> Null Deviance:       397.3 
#> Residual Deviance: 0.8152    AIC: -191.2

However, I am stuck again because I am trying to call the output from this model in lsmeans::lsmeans() to predict marginal means and return the output but it is giving me an error. Although it does not need an offset, I am including it here so that I can get a more general code that I can modify later. Any help would be greatly appreciated

to_predict_lsmeans <- function(dep, indep, data){
  model <- glm(substitute(dep ~ factor(indep)), data=data)
  pred <- lsmeans:: lsmeans(model, substitute(~ factor(indep)), offset=substitute(data)$log(age), type ="response" )
  return(pred)
}

pred <- to_predict_lsmeans(dep=age, indep=sex, data=dsn)
#> Error in ref_grid(object, ...): We are unable to reconstruct the data.
#> The variables needed are:
#>  sex
#> Are any of these actually constants? (specify via 'params = ')
#> The dataset name is:
#>  data
#> Does the data still exist? Or you can specify a dataset via 'data = '
pred
#> Error in eval(expr, envir, enclos): object 'pred' not found

Upvotes: 0

Onyambu
Onyambu

Reputation: 79208

You could use any of the following:

Using substitute:

to_analyze <- function(dep, indep, data){
  glm(substitute(dep ~ factor(indep)), data=data)
}

to_analyze(dep=age, indep=sex, data=dsn)

Advantage: Can write the independent as a formula.

eg

 to_analyze(Petal.Width, Sepal.Length + Sepal.Width, data = iris)

Using reformulate as stated by @NelsonGon

to_analyze <- function(dep, indep, data){ 
  glm(reformulate(sprintf("factor(%s)",indep), dep),  data = data) 
 }

Note that to call this function, the variables aught to be of type character

 to_analyze(dep= "age", indep="sex", data=dsn)

Recall glm can also take a string that can be parsed to a formula:

to_analyze <- function(dep, indep, data){ 
  glm(sprintf("%s~factor(%s)", dep, indep),  data = data) 
}

to_analyze("age", "sex", data=dsn)

or even:

to_analyze <- function(dep, indep, data){ 
  glm(paste(dep,"~ factor(",indep,")"),  data = data) 
}

to_analyze("age", "sex", data=dsn)

LASTLY: to combine both the substitute and paste:

to_analyze <- function(dep, indep, data){ 
  glm(paste(substitute(dep),"~ factor(",substitute(indep),")"),  data = data) 
}

will work for both symbols and characters. eg:

to_analyze(age, sex, data=dsn)
to_analyze("age", "sex", data=dsn)

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76402

Create a "formula" object in the function and pass to glm.

To get the variables without giving an error the standard trick is deparse(substitute(.)).
Then compose the formula with paste.

to_analyze <- function(dep, indep, data){
  dep <- deparse(substitute(dep))
  indep <- deparse(substitute(indep))
  indep <- paste0("factor(", indep, ")")
  fmla <- paste(dep, indep, sep = " ~ ")
  fmla <- as.formula(fmla)
  glm(fmla, data = data)
}

to_analyze(dep=age, indep=sex, data=dsn)
#
#Call:  glm(formula = fmla, data = data)
#
#Coefficients:
# (Intercept)  factor(sex)M  
#      23.984        -3.984  
#
#Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
#Null Deviance:     396.2 
#Residual Deviance: 0.837   AIC: -188.5

Upvotes: 1

Related Questions