Reputation: 119
I split my data set into 2 data frames: train (having 830 rows) and test (200 rows). The column names are identical and in the same order.
I built a natural spline model to predict strength on a single variable, cement.
When I try to use that model to make predictions on my test set, instead of getting the expected 200 predictions, I get 830 predictions. I don't know why this is happening. I've been through the help pages and the web, but I haven't found anything that fixes this problem.
I have checked the dimensions of test$cement and it does have only 200 entries.
Here is my code right now:
library(tidyverse)
library(caret)
library(splines)
attach(train)
fit1 <- lm(strength~ns(cement, 4), data = train)
summary(fit1)
pred1 <- predict(fit1, newdata = data.frame(test$cement), se=T)
pred1
detach(train)
I've also tried these predict versions:
pred2 <- fit1 %>% predict(test$cement)
--> which gives me errors saying it doesn't understand %>%
pred = predict.bSpline(fit1, newdata = test$cement, se=T)
and
pred = predict.bSpline2(fit1, newdata = test$cement, se=T)
--> Both of which tell me they cannot find the function predict.bSpline or predict.bSpline2, although I opened up both the splines and the splines2 libraries.
Also predict.ns doesn't seem to exist.
Any help would be greatly appreciated.
Upvotes: 1
Views: 1808
Reputation: 2368
I think we would need to see a repoducible example of your data set because I think that might be causing your issue. When I use the following code I get the proper results:
library(dplyr)
library(splines)
train <- sample_frac(mtcars, .8)
test <- setdiff(mtcars, train)
fit1 <- lm(mpg~ns(wt, 4), data = train)
pred1 <- predict(fit1, newdata = test, se=T)
pred1
Just thinking about caret
I would make sure that you are not creating a list object when you are creating your train/test splits (createDataPartition(list = FALSE)
)
Additionally, I would not use attach
. IT is a little more typing to write out all the variable names, but you can sometimes get odd results which might be what is happening to you. I would restart your R session, remove the attach and then try again.
Not the best solution, but I would give it a go.
Upvotes: 1
Reputation: 100
I don't have your data but you should try passing whole dataframe test:
pred1 <- predict(fit1, newdata = test, se=T)
also "%>%" operator is defined in "tidyr" library I believe.
I think that predict.ns predict.bSpline override predict function - look up it's documentation. Although I've never used spline objects what I've read suggests that you just use object type from this library with normal "predict" function.
Upvotes: 1