Regression with many variables, but not enough to justify using . and subtracting unnecessary variables

Question

I'm trying to run a regression with roughly 20 variables, in a dataset that has 50 variables. So it looks something like:

lm(data=data, formula = y ~ explanatory_1 + ... + explanatory_20)

Obviously this works fine, but we want the code to look a little cleaner. A lot of answers tell you to use . - however, I don't want to do that, because the dataset has about 20 or so variables that we don't use in the regression. i.e. We'd be subtracting as many variables as we include in the normal regression.

Is there a way to group the explanatory vars into a list, so it can instead look like

lm(data=data, formula = y ~ list)?

Furthermore, in some specifications we include a new covariate that also acts as an interaction term on all the original covariates, so ideally we would have

lm(data=data, formula = y ~ list + new_var + new_var:list).

Can this be done? Thanks!

otheracct · Accepted Answer

You can put the explanatory variables in a vector and use reformulate

x_vars <- c('cyl', 'disp', 'hp')
lm(data = mtcars, formula = reformulate(x_vars, response = 'mpg'))

Regression with many variables, but not enough to justify using . and subtracting unnecessary variables

Answers (1)

Related Questions