Alex
Alex

Reputation: 353

prediction using linear model and the importance of data.frame

I am writing to ask why should we add data.frame() to predict by using lm

the first chunk of code is supposed to be wrong and the second chunk of code is supposed to be correct.

dim(iris)
model_1<-lm(Sepal.Length~Sepal.Width, data=iris)
summary(model_1)
print(predict(model_1, Sepal.Width=c(1,3,4,5)))

dim(iris)
model_1<-lm(Sepal.Length~Sepal.Width, data=iris)
summary(model_1)
print(predict(model_1,data.frame(Sepal.Width=c(1,3,4,5))))

Upvotes: 1

Views: 570

Answers (1)

StupidWolf
StupidWolf

Reputation: 46948

When you call predict on a lm object, the function called is predict.lm. When you run it like:

predict(model_1, Sepal.Width=c(1,3,4,5))

What you are doing is providing c(1,3,4,5) an argument or parameter to Sepal.Width, which predict.lm ignores since this argument does not exist for this function.

When there is no new input data, you are running predict.lm(model_1), and getting back the fitted values:

table(predict(model_1) == predict(model_1, Sepal.Width=c(1,3,4,5)))

TRUE 
 150

In this case, you fitted the model with a formula, the predict.lm function needs your data frame to reconstruct the independent or exogenous matrix, matrix multiply with the coefficients and return you the predicted values.

This is briefly what predict.lm is doing:

newdata = data.frame(Sepal.Width=c(1,3,4,5))
Terms = delete.response(terms(model_1))
X = model.matrix(Terms,newdata)

X
  (Intercept) Sepal.Width
1           1           1
2           1           3
3           1           4
4           1           5

X %*% coefficients(model_1)
      [,1]
1 6.302861
2 5.856139
3 5.632778
4 5.409417

predict(model_1,newdata)

       1        2        3        4 
6.302861 5.856139 5.632778 5.409417

Upvotes: 2

Related Questions