Reputation: 277
I have to add approximately 30 dummy variables to a regression.
If my variables would be named dummy1
- dummy30
, I would denote this with an asterisk wildcard in STATA
. It would be simply regress y dummy*
and STATA
would add all variables starting with 'dummy'.
Can anyone hint me to a similar convenient procedure in [R]
which prevents me from writing out 30 variable names?
Upvotes: 3
Views: 1752
Reputation: 81713
The function reformulate
is the right option for creating formulas based on strings.
An example data frame:
set.seed(1)
dat <- data.frame(y = rnorm(10),
dummy1 = rnorm(10),
dummy2 = rnorm(10),
dummy3 = rnorm(10),
other = rnorm(10))
Now, grep
is used to find all dummy*
variables. The result is used for the function reformulate
:
form <- reformulate(grep("^dummy", names(dat), value = TRUE), response = "y")
# y ~ dummy1 + dummy2 + dummy3
This formula can be used for lm
:
lm(form, dat)
# Call:
# lm(formula = form, data = dat)
#
# Coefficients:
# (Intercept) dummy1 dummy2 dummy3
# 0.04785 0.09323 -0.63404 -0.19547
Upvotes: 3
Reputation: 70653
You have two options. Either subset a data.frame
to contain only dummy* variables and the dependent variable. In which case, you can call lm(dep ~ ., data = your.data)
. The dot argument will assume you're trying to use all but dep
as predictors.
To subset a data.frame
of only dep
and predictors, you can use your.data[grepl("dep|dummy", names(your.data)), ]
.
Second option is to construct a formula argument using paste.
formula(paste("dep ~", paste("dummy", 1:10, sep = "", collapse = "+")))
Upvotes: 4