Creating new Functions with Linear Regression in R :

I'm having a trouble when creating a function that calls the lm() function:

regresionLineal <- function (vardep, varindep1, varindep2, DATA) {
  lm(vardep ~ varindep1 + varindep2, data = DATA)
  }

Then I call it using data from a data frame I created previously (DATOS)...

regresionLineal(Estatura, Largo, Ancho, DATOS)

Error in eval(expr, envir, enclos) : object 'Estatura' not found Called from: eval(expr, envir, enclos)

Any help will be welcome...

Upvotes: 2

Views: 4669

Answers (4)

asifzuba
asifzuba

Reputation: 470

Just thought I'd add to this for any future reader.

The solution I came up with (which is not perfect) is the following function:

f <- function(y, x1, x2, df) {
  cmd = paste0("lm(", y, " ~ ", x1, " + ", x2, ", data = ",  deparse1(substitute(df)), ")")
  eval(parse(text = cmd))
}

By doing this you can call, for example,

R> f("mpg", "hp", "wt", mtcars)
Call:
lm(formula = mpg ~ hp + wt, data = mtcars)
Coefficients:
(Intercept)           hp           wt  
    37.2273      -0.0318      -3.8778

The main advantage over other approaches is that the output of the lm does not obfuscate names of variable or the dataframe.

Perhaps future readers can appreciate that running this command requires knowledge of R base functions: parse, deparse1, substitute and eval

Thanks!

Upvotes: 1

Jonathan Aron
Jonathan Aron

Reputation: 25

If you want to create a model with an arbitrary number of independent variables, you can use the below:

create_lm <- function(data, dep, covs) {
# Create the first part of the formula with the dependent variable
  form_base <- paste(dep, "~")
# Create a string that concatenates your covs vector with a "+" between each variable
  form_vars <- paste(covs, collapse = " + ")
# Paste the two parts together
  formula <- paste(form_base, form_vars)
# Call the lm function on your formula
  lm(formula, data = data)
}

For instance, using the built-in mtcars dataset:

create_lm(mtcars, "mpg", c("wt", "cyl"))

Call:
lm(formula = formula, data = data)

Coefficients:
(Intercept)           wt          cyl  
     39.686       -3.191       -1.508  

The downside is that the printed output from the model doesn't reflect the particular call you made to lm, not sure if there is any way around this.

Upvotes: 1

Sean
Sean

Reputation: 11

Also, you may already know this, but it might be helpful to keep in mind that the regression object created here won't exist outside of the function unless assigned to the global environment or whatever environment you're working in. If you need to call the reg. object outside of this function later for some reason you should assign it as: model1 <<- lm(paste(vardep, "~", varindep1, "+", varindep2), data = DATA) to be able to call from the global env.

Upvotes: 1

Zheyuan Li
Zheyuan Li

Reputation: 73265

You should do:

regresionLineal <- function (vardep, varindep1, varindep2, DATA) {
  lm(paste(vardep, "~", varindep1, "+", varindep2), data = DATA)
  }

where you pass in vardep, varindep1, varindep2 as strings. As an example, I use R's built-in trees dataset:

regresionLineal("Height", "Girth", "Volumn", trees)
# Call:
# lm(formula = paste(vardep, "~", varindep1, "+", varindep2), data = DATA)

# Coefficients:
# (Intercept)        Girth       Volume  
#     83.2958      -1.8615       0.5756  

However, I don't see why we bother doing this. If we have to specify every variable in the formula, why not simply pass in a complete formula? And in that case, you can use lm() directly without define your own function.

Upvotes: 9

Related Questions