Reputation: 11657
I first use grep
to obtain all variable names that begin with the preface: "h_." I then collapse that array into a single string, separated with plus signs. Is there a way to subsequently use this string in a linear regression?
For example:
holiday_array <- grep("h_", names(df), value=TRUE)
holiday_string = paste(holiday_array, collapse=' + ' )
r_3 <- lm(log(assaults) ~ year + month + holiday_string, data = df)
I get the straightforward error variable lengths differ (found for 'holiday_string')
I can do it like this, for example:
holiday_formula <- as.formula(paste('log(assaults) ~ attend_v + year+ month + ', paste("", holiday_vars, collapse='+')))
r_3 <- lm(holiday_formula, data = df)
But I don't want to have to type a separate formula construction for each new set of controls. I want to be able to add the "string" inside the lm function. Is this possible?
The above is problematic, because let's say I want to then add another set of control variables to the formula contained in holiday_formula
, so something like this:
weather_vars <- grep("w_", names(df), value=TRUE) weather_formula <- as.formula(paste(holiday_formula, paste("+", weather_vars, collapse='+')))
Not sure how you would do the above.
Upvotes: 1
Views: 3711
Reputation: 263332
I don't know a simple method for construction of a formula argument different than the one you are rejecting (although I considered and rejected using update.formula
since it would also have required using as.formula
), but this is an alternate method for achieving the same goal. It uses the "."-expansion feature of R-formulas and relies on the ability of the [
-function to accept character argument for column selection:
r_3 <- lm(log(assaults) ~ attend_v + year+ month + . ,
data = df[ , c('assaults', 'attend_v', 'year', 'month', holiday_vars] )
Upvotes: 5