Reputation: 25
I am doing a regression with several categorial variables and continuous variables mixed together. For simplify my question, I want to create a regression model that predicts the driving time given a certain driver in different zones with driving miles. That's say I have 5 different drivers and 2 zones in my training data.
I know I probably need to build 5*2=10 regression models for prediction. What I am using in R is
m <- lm(driving_time ~ factor(driver)+factor(zone)+miles)
But it seems like R doesn't expend the combination. My problem is if there are any smart way to do the expansion automatically in R. Or I have to write the 10 regression models one by one. Thank you.
Upvotes: 1
Views: 540
Reputation: 174813
Please read ?formula
. +
in a formula means include that variable as a main effect. You seem to be looking for an interaction term between driver
and zone
. You create an interaction term using the :
operator. There is also a short cut to get both main and interaction effect via the *
operator.
There is some confusion as to whether you want miles
to also interact, but I'll assume not here as you only mention 2 x 5 terms.
foo <- transform(foo, driver = factor(driver), zone = factor(zone))
m <- lm(driving_time ~ driver * zone + miles, data = foo)
Here I assume your data are in data frame foo
. The first line separates the data processing from the model specification/fitting by converting the variables of interest to factors before fitting.
The formula then specifies main and interactive effects for driver
and zone
plus main effect for miles
.
If you want interactions between all three then:
m <- lm(driving_time ~ driver * zone * miles, data = foo)
or
m <- lm(driving_time ~ (driver + zone + miles)^3, data = foo)
would do that for you.
Upvotes: 1