Reputation: 209
I've met a problem about function bs()
.
library(ISLR)
library(ggplot2)
library(caret)
data(Wage)
#summary(Wage)
set.seed(123)
inTrain <- createDataPartition(Wage$wage, p = 0.7, list = F)
training <- Wage[inTrain,]
testing <- Wage[-inTrain,]
library(splines)
bsBasis <- bs(training$age, df=3)
bsBasis[1:12,]
lm1 <- lm(wage ~ bsBasis, data=training)
lm1$coefficients
## (Intercept) bsBasis1 bsBasis2 bsBasis3
## 60.22 93.39 51.05 47.28
plot(training$age, training$wage, pch=19, cex=0.5)
points(training$age, predict(lm1, newdata=training), col="red", pch=19, cex=0.5)
predict(bsBasis, age=testing$age)
The dimensions of predict(bsBasis, age=testing$age)
is 2012x3, while the testing$age
got only 988 rows. And the results of predict(bsBasis, age=testing$age)
is identical to the bsBasis
.
My questions are:
predict(bsBasis, age=testing$age)
actually doing?bsBasis
in predicting the wage
in the TEST data correctly?Upvotes: 0
Views: 761
Reputation: 73385
Your question 1
Use newx
. Check ?predict.bs
for its arguments.
x <- runif(100)
b <- bs(x, df = 3)
predict(b, newx = c(0.2, 0.5))
Different predict
functions may behave differently. Here, no matter what variable you use in bs()
, age
, sex
, height
, etc, it can only be newx
in predict.bs()
.
Your question 2
You don't really need to form explicitly bsBasis
. When using splines in regression, lm
and predict.lm
will hide construction and prediction of spline from you.
lm1 <- lm(wage ~ bs(age, df = 3), data=training)
predict(lm1, newdata = test)
Note the argument in predict.lm
is newdata
.
Upvotes: 2