Parseltongue
Parseltongue

Reputation: 11657

R: Use string containing variable names in regression

I first use grep to obtain all variable names that begin with the preface: "h_." I then collapse that array into a single string, separated with plus signs. Is there a way to subsequently use this string in a linear regression?

For example:

holiday_array <- grep("h_", names(df), value=TRUE)
holiday_string =  paste(holiday_array, collapse=' + ' )
r_3 <- lm(log(assaults) ~ year + month + holiday_string, data = df)

I get the straightforward error variable lengths differ (found for 'holiday_string')

I can do it like this, for example:

  holiday_formula <- as.formula(paste('log(assaults) ~ attend_v + year+ month + ', paste("", holiday_vars, collapse='+')))
  r_3 <- lm(holiday_formula, data = df)

But I don't want to have to type a separate formula construction for each new set of controls. I want to be able to add the "string" inside the lm function. Is this possible?

The above is problematic, because let's say I want to then add another set of control variables to the formula contained in holiday_formula, so something like this:

weather_vars <- grep("w_", names(df), value=TRUE) weather_formula <- as.formula(paste(holiday_formula, paste("+", weather_vars, collapse='+')))

Not sure how you would do the above.

Upvotes: 1

Views: 3711

Answers (1)

IRTFM
IRTFM

Reputation: 263332

I don't know a simple method for construction of a formula argument different than the one you are rejecting (although I considered and rejected using update.formula since it would also have required using as.formula), but this is an alternate method for achieving the same goal. It uses the "."-expansion feature of R-formulas and relies on the ability of the [-function to accept character argument for column selection:

  r_3 <- lm(log(assaults) ~ attend_v + year+ month + . ,
            data = df[ , c('assaults', 'attend_v', 'year', 'month', holiday_vars] )

Upvotes: 5

Related Questions