Reputation: 67
I am self-teaching r from "An Introduction to Statistical Learning: With Applications in R". I am sure I should get the same mean for both codes. However, I get a drastically different result. Can someone please help me find out why am I not getting the same msg? Looks like the first code chunk is wrong. These came from the Auto data set. My predictions and the book's predictions are different. However, the index on which these two were trained was the same.
First Chunk (my code)
set.seed(1)
train_index = sample (392, 196)
Auto$index = c(1:nrow(Auto))
train_df = Auto[train_index,]
test_df = anti_join(Auto, train_df, by="index")
attach(train_df)
lm.fit = lm(mpg ~ horsepower)
predictions = predict(lm.fit, horsepower = test_df$horsepower)
mean((test_df$mpg - predictions)^2)
Second Chunk (book's code - An Introduction to Statistical Learning: With Applications in R)
set. seed (1)
train = sample (392, 196)
lm.fit = lm(mpg ~ horsepower , data = Auto , subset = train)
attach(Auto)
mean (( mpg - predict(lm.fit , Auto))[-train ]^2)
Upvotes: 3
Views: 48
Reputation: 17204
In your code, you’re not specifying the test data correctly in predict()
. predict()
takes a dataframe containing predictor variables, passed to the newdata
argument; instead, you include horsepower = test_df$horsepower
, which just gets absorbed by ...
and has no effect.
If you instead pass the whole test_df
dataframe to newdata
, you get the same result as the text.
library(ISLR)
library(dplyr)
set.seed(1)
# OP’s code with change to predict()
train_index = sample(392, 196)
Auto$index = c(1:nrow(Auto))
train_df = Auto[train_index,]
test_df = anti_join(Auto, train_df, by="index")
attach(train_df)
lm.fit = lm(mpg ~ horsepower)
predictions = predict(lm.fit, newdata = test_df)
mean((test_df$mpg - predictions)^2)
# 23.26601
# ISLR code
set.seed (1)
train = sample (392 , 196)
lm.fit = lm(mpg ~ horsepower , data = Auto , subset = train)
attach(Auto)
mean (( mpg - predict(lm.fit , Auto))[-train ]^2)
# 23.26601
Upvotes: 1