star14
star14

Reputation: 11

Why is there a difference in Linear regression fitted values and predicted values on training data?

library(MASS)
data(Boston)
head(Boston)
index <- sample(nrow(Boston),nrow(Boston)*.80)
train <- Boston[index,]
test <- Boston[-index,]
model_1 <- lm(medv~.,data=train)
model_1train_p <- predict(model_1)
mean(model_1$fitted.values - model_1train_p)

Code to simulate the issue. I wanted to know why is there a non- zero difference.

Upvotes: 0

Views: 131

Answers (1)

dario
dario

Reputation: 6483

The difference is because computers just can't handle decimal values exactly. The difference you get is very, very small, due to internal representation of any number (or any thing for that matters) as binary. It's just not always possibe to get an exact binary representation of a decimal.

If you want to check for equality of decimal numbers, use all.equal:

all.equal(model_1$fitted.values, model_1train_p)

Returns:

[1] TRUE

Upvotes: 2

Related Questions