Reputation: 353
I am writing to ask why should we add data.frame() to predict by using lm
the first chunk of code is supposed to be wrong and the second chunk of code is supposed to be correct.
dim(iris)
model_1<-lm(Sepal.Length~Sepal.Width, data=iris)
summary(model_1)
print(predict(model_1, Sepal.Width=c(1,3,4,5)))
dim(iris)
model_1<-lm(Sepal.Length~Sepal.Width, data=iris)
summary(model_1)
print(predict(model_1,data.frame(Sepal.Width=c(1,3,4,5))))
Upvotes: 1
Views: 570
Reputation: 46948
When you call predict
on a lm
object, the function called is predict.lm. When you run it like:
predict(model_1, Sepal.Width=c(1,3,4,5))
What you are doing is providing c(1,3,4,5)
an argument or parameter to Sepal.Width
, which predict.lm
ignores since this argument does not exist for this function.
When there is no new input data, you are running predict.lm(model_1)
, and getting back the fitted values:
table(predict(model_1) == predict(model_1, Sepal.Width=c(1,3,4,5)))
TRUE
150
In this case, you fitted the model with a formula, the predict.lm
function needs your data frame to reconstruct the independent or exogenous matrix, matrix multiply with the coefficients and return you the predicted values.
This is briefly what predict.lm
is doing:
newdata = data.frame(Sepal.Width=c(1,3,4,5))
Terms = delete.response(terms(model_1))
X = model.matrix(Terms,newdata)
X
(Intercept) Sepal.Width
1 1 1
2 1 3
3 1 4
4 1 5
X %*% coefficients(model_1)
[,1]
1 6.302861
2 5.856139
3 5.632778
4 5.409417
predict(model_1,newdata)
1 2 3 4
6.302861 5.856139 5.632778 5.409417
Upvotes: 2