Scott Chang
Scott Chang

Reputation: 25

Expansion Regression models In R

I am doing a regression with several categorial variables and continuous variables mixed together. For simplify my question, I want to create a regression model that predicts the driving time given a certain driver in different zones with driving miles. That's say I have 5 different drivers and 2 zones in my training data.

I know I probably need to build 5*2=10 regression models for prediction. What I am using in R is

m <- lm(driving_time ~ factor(driver)+factor(zone)+miles)

But it seems like R doesn't expend the combination. My problem is if there are any smart way to do the expansion automatically in R. Or I have to write the 10 regression models one by one. Thank you.

Upvotes: 1

Views: 540

Answers (1)

Gavin Simpson
Gavin Simpson

Reputation: 174813

Please read ?formula. + in a formula means include that variable as a main effect. You seem to be looking for an interaction term between driver and zone. You create an interaction term using the : operator. There is also a short cut to get both main and interaction effect via the * operator.

There is some confusion as to whether you want miles to also interact, but I'll assume not here as you only mention 2 x 5 terms.

foo <- transform(foo, driver = factor(driver), zone = factor(zone))
m <- lm(driving_time ~ driver * zone + miles, data = foo)

Here I assume your data are in data frame foo. The first line separates the data processing from the model specification/fitting by converting the variables of interest to factors before fitting.

The formula then specifies main and interactive effects for driver and zone plus main effect for miles.

If you want interactions between all three then:

m <- lm(driving_time ~ driver * zone * miles, data = foo)

or

m <- lm(driving_time ~ (driver + zone + miles)^3, data = foo)

would do that for you.

Upvotes: 1

Related Questions