Raynor
Raynor

Reputation: 277

How to conveniently add a large set of regressors in R?

I have to add approximately 30 dummy variables to a regression.

If my variables would be named dummy1 - dummy30, I would denote this with an asterisk wildcard in STATA. It would be simply regress y dummy* and STATA would add all variables starting with 'dummy'.

Can anyone hint me to a similar convenient procedure in [R] which prevents me from writing out 30 variable names?

Upvotes: 3

Views: 1752

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81713

The function reformulate is the right option for creating formulas based on strings.

An example data frame:

set.seed(1)
dat <- data.frame(y = rnorm(10), 
                  dummy1 = rnorm(10),
                  dummy2 = rnorm(10),
                  dummy3 = rnorm(10),
                  other = rnorm(10))

Now, grep is used to find all dummy* variables. The result is used for the function reformulate:

form <- reformulate(grep("^dummy", names(dat), value = TRUE), response = "y")
# y ~ dummy1 + dummy2 + dummy3

This formula can be used for lm:

lm(form, dat)
# Call:
#   lm(formula = form, data = dat)
# 
# Coefficients:
#   (Intercept)       dummy1       dummy2       dummy3  
# 0.04785      0.09323     -0.63404     -0.19547

Upvotes: 3

Roman Luštrik
Roman Luštrik

Reputation: 70653

You have two options. Either subset a data.frame to contain only dummy* variables and the dependent variable. In which case, you can call lm(dep ~ ., data = your.data). The dot argument will assume you're trying to use all but dep as predictors. To subset a data.frame of only dep and predictors, you can use your.data[grepl("dep|dummy", names(your.data)), ].

Second option is to construct a formula argument using paste.

formula(paste("dep ~", paste("dummy", 1:10, sep = "", collapse = "+")))

Upvotes: 4

Related Questions