R - Using apply only on specific columns

Question

I am using a package in R that fits a specific form of a regression model. However, unlike the base lm() function that permits the x and y to be separate objects, the function that I'm using requires them to be in the same dataframe.

My problem arises because I have a lot of variables that I want to regress on y independently. Therefore, I have a dataframe with 10 predictor variables (x1, x2... x10) and one criterion variable (y), 11 columns in total. I could use a for loop to run ten separate regressions, but I want to avoid it and use the apply function instead. However, if I call apply on my dataframe, in the last step it will regress y on y itself and I want to avoid this. Is there a function similar to apply which I could run and specify thiat I only want it to run 10 times and not 11, or is there another workaround to this problem?

Artem Sokolov · Accepted Answer

Here's a tidyverse solution:

library( tidyverse )

xx <- c("disp", "hp", "drat", "wt")   # Names of predictor variables
y <- "mpg"                            # Name of response

str_c( y, xx, sep="~" ) %>%
  map( as.formula ) %>%               # Optional (see below)
  map( lm, data = mtcars )

str_c simply builds up formulas as strings (e.g., "mpg~disp"). While lm accepts strings directly, your particular regression model might not. If it requires an actual formula, you can convert strings to formulas using as.formula (Thanks for the suggestion, @J.Doe!). Other than that, simply replace lm with your particular model and mtcars with your data frame.

Here's the same solution using base R without any additional packages:

strs <- paste( y, xx, sep="~" )
strs <- lapply( strs, as.formula )    # Optional
lapply( strs, lm, data=mtcars )

R - Using apply only on specific columns

Answers (2)

Related Questions