J. Doe
J. Doe

Reputation: 1750

R - Using apply only on specific columns

I am using a package in R that fits a specific form of a regression model. However, unlike the base lm() function that permits the x and y to be separate objects, the function that I'm using requires them to be in the same dataframe.

My problem arises because I have a lot of variables that I want to regress on y independently. Therefore, I have a dataframe with 10 predictor variables (x1, x2... x10) and one criterion variable (y), 11 columns in total. I could use a for loop to run ten separate regressions, but I want to avoid it and use the apply function instead. However, if I call apply on my dataframe, in the last step it will regress y on y itself and I want to avoid this. Is there a function similar to apply which I could run and specify thiat I only want it to run 10 times and not 11, or is there another workaround to this problem?

Upvotes: 0

Views: 576

Answers (2)

Artem Sokolov
Artem Sokolov

Reputation: 13731

Here's a tidyverse solution:

library( tidyverse )

xx <- c("disp", "hp", "drat", "wt")   # Names of predictor variables
y <- "mpg"                            # Name of response

str_c( y, xx, sep="~" ) %>%
  map( as.formula ) %>%               # Optional (see below)
  map( lm, data = mtcars )

str_c simply builds up formulas as strings (e.g., "mpg~disp"). While lm accepts strings directly, your particular regression model might not. If it requires an actual formula, you can convert strings to formulas using as.formula (Thanks for the suggestion, @J.Doe!). Other than that, simply replace lm with your particular model and mtcars with your data frame.


Here's the same solution using base R without any additional packages:

strs <- paste( y, xx, sep="~" )
strs <- lapply( strs, as.formula )    # Optional
lapply( strs, lm, data=mtcars )

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 270248

Using the builtin anscombe data frame having columns x1, x2, x3, x4, y1, y2, y3, y4 suppose we want to regress y1 on each of x1, x2, x3, x4 separately.

First create a character vector of the names of the independent variables, xnames, and the use lapply to run the indicated run_lm over it. That function pastes together the required formula and performs the lm returning an "lm" class object. L, the result, is a list of such objects, one for each regression.

No packages are used.

xnames <- names(anscombe)[1:4]
run_lm <- function(nm) lm(paste("y1 ~", nm), anscombe)
L <- lapply(xnames, run_lm)

Alternately, this shorter version of run_lm would also work with the above lapply but the Call: output line is not as nice:

run_lm <- function(nm) lm(anscombe[c("y1", nm)])

Upvotes: 0

Related Questions