Reputation: 1053
I have split the Boston dataset into training and test sets as below:
library(MASS)
smp_size <- floor(.7 * nrow(Boston))
set.seed(133)
train_boston <- sample(seq_len(nrow(Boston)), size = smp_size)
train_ind <- sample(seq_len(nrow(Boston)), size = smp_size)
train_boston <- Boston[train_ind, ]
test_boston <- Boston[-train_ind,]
nrow(train_boston)
# [1] 354
nrow(test_boston)
# [1] 152
Now I get the RSE using lm function as below:
train_boston.lm <- lm(lstat~medv, train_boston)
summary(train_boston.lm)
summary(train_boston.lm)$sigma
How can I calculate Residual Standard error for the test data set? I can't use lm function on the test data set. Is there any method to calculate RSE on test data set?
Upvotes: 0
Views: 11226
Reputation: 206606
Here your residual standard error is the same as
summary(train_boston.lm)$sigma
# [1] 4.73988
sqrt(sum((fitted(train_boston.lm)-train_boston$lstat)^2)/
(nrow(train_boston)-2))
# [1] 4.73988
you loose are estimating two parameters so your degrees of freedom is n-2
With your test data, you're not really doing the same estimation, but if you wanted to calculate the same type of calculation substituting the predicted value from the model for your new data for the fitted values from the original model, you can do
sqrt(sum((predict(train_boston.lm, test_boston)-test_boston$lstat)^2)/
(nrow(test_boston)-2))
Although it may make more sense just to calculate the standard deviation of the predicted residuals
sd(predict(train_boston.lm, test_boston)-test_boston$lstat)
Upvotes: 3