Jh00ni
Jh00ni

Reputation: 67

Linear regression prediction using interaction terms in R

I am trying to code a model which uses interaction term and generate out-of-sample predictions using the model.

My training sample has 3 variables and 11 rows. My test sample has 3 variables and 1 row.

My code is the following.

inter.model <- lm(Y.train ~ Y.lag.train +  X.1.train + X.1.train:X.2.train)

However, I am not quite sure how R handles the interaction terms. I have coded the predictions using the coefficients from the model and the test data.

inter.prediction <- inter.model$coef[1] + inter.model$coef[2]*Y.lag.test + 
        inter.model$coef[3]*X.1.test + (inter.model$coef[4]*X.1.test*X.2.test)

I wanted to make sure that these predictions were correctly coded. Thus, I tried to produce them with the R´s predict-function.

inter.pred.function <- predict(inter.model, newdata=test_data)

However, I am getting a error message:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  variable lengths differ (found for 'X.2.train')
In addition: Warning message:
'newdata' had 1 row but variables found have 11 rows 
names(test_data)
[1] "Y.lag.test" "X.1.test" "X.1.test:X.2.test"

So, my question is, how do you code and make linear regression predictions with interaction terms in R?

Upvotes: 0

Views: 1079

Answers (1)

jay.sf
jay.sf

Reputation: 72813

You won't need "X.1.test:X.2.test" in your new data, the interaction is created automatically in stats:::predict.lm via the model.matrix.

fit <- lm(mpg ~ hp*am, mtcars[1:10, ])

test <- mtcars[-(1:10), c('mpg', 'hp', 'am')]

as.numeric(predict(fit, newdata=test))
# [1] 20.220513 17.430053 17.430053 17.430053 16.206167 15.716612 14.982281 25.658824 27.141176 25.764706
# [11] 21.493355 18.898716 18.898716 14.247949 17.674830 25.658824 23.011765 20.682353  4.694118 14.117647
# [21] -2.823529 21.105882

Upvotes: 2

Related Questions