Daniel
Daniel

Reputation: 11

R - Automatically adjust the model formula

I am trying to find a way to automatically adjust the model formula that R will use to fit any sort of model. Here is a simple example. In the code below I want to be able to choose if I want to include "a" and "b" into the model or not by providing "include.a/b". If I choose "TRUE" it should be included into the model formula, if not left out.

x=1:10
y=2:11
y[9] = y[9]+1

a = rep(3, times = 10)
a[7] = 7
b = c(3:10, 10, 10)

include.a = FALSE
include.b = TRUE

# to get the model y ~ x + b
model = lm(y ~ x 
           if(include.b == TRUE){+ b)}
           )

I've been searching this website everywhere but cannot find any hints.

Upvotes: 1

Views: 117

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269371

1) Use reformulate as shown:

fo <- reformulate(c("x", if (include.a) "a", if (include.b) "b"), "y")
lm(fo)

giving:

Call:
lm(formula = fo)

Coefficients:
(Intercept)            x            b  
    1.06154      1.10769     -0.07692  

2) Alternately call lm like this:

do.call("lm", list(fo))

giving a nicer Call: line:

Call:
lm(formula = y ~ x + b)

Coefficients:
(Intercept)            x            b  
    1.06154      1.10769     -0.07692  

3) Also consider a design where a single character vector v of variable names is provided.

v <- "b"
fo <- reformulate(c("x", v), "y")
lm(fo)

v <- c("a", "b")
fo <- reformulate(c("x", v), "y")
lm(fo)

v <- c()
fo <- reformulate(c("x", v), "y")
lm(fo)

In a function it would be written like this:

my_lm <- function(v = c(), resp = "y", indep = "x", env = parent.frame()) {
  fo <- reformulate(c(indep, v), resp, env = env)
  do.call("lm", list(fo))
}

my_lm("b")

Upvotes: 0

jpsmith
jpsmith

Reputation: 17174

One option would be to define a character vector with the desired covariate names then create a formula using as.formula() then plug it in to lm():

# specify what you want to include
# both a and b
includes <- c("a","b")

# define formula
frmla <- as.formula(paste0("x ~ y", 
                           ifelse(!is.null(includes), 
                                  paste0("+", paste(includes, collapse = "+")),"")))
# > frmla
# x ~ y + a + b

# Run model
lm(frmla)

#Call:
#lm(formula = frmla)

#Coefficients:
#(Intercept)            y            a            b  
# -1.250e+00    7.500e-01    8.885e-17    2.500e-01  

Add as many as you like

includes <- c("a", "b", "c", "d", "f")

frmla <- as.formula(paste0("x ~ y", ifelse(!is.null(includes), paste0("+",paste(includes, collapse = "+")),"")))
#> frmla
#x ~ y + a + b + c + d + f

Or none at all:

includes <- c()
frmla <- as.formula(paste0("x ~ y", ifelse(!is.null(includes), paste0("+",paste(includes, collapse = "+")),"")))

# > frmla
# x ~ y

Upvotes: 2

Related Questions