Catherine
Catherine

Reputation: 119

predict in R gives wrong number of predictions

I split my data set into 2 data frames: train (having 830 rows) and test (200 rows). The column names are identical and in the same order.

I built a natural spline model to predict strength on a single variable, cement.

When I try to use that model to make predictions on my test set, instead of getting the expected 200 predictions, I get 830 predictions. I don't know why this is happening. I've been through the help pages and the web, but I haven't found anything that fixes this problem.

I have checked the dimensions of test$cement and it does have only 200 entries.

Here is my code right now:

library(tidyverse)
library(caret)
library(splines)

attach(train)
fit1 <- lm(strength~ns(cement, 4), data = train)
summary(fit1)
pred1 <- predict(fit1, newdata = data.frame(test$cement), se=T)
pred1
detach(train)

I've also tried these predict versions:

pred2 <- fit1 %>% predict(test$cement)

--> which gives me errors saying it doesn't understand %>%

pred = predict.bSpline(fit1, newdata = test$cement, se=T)

and

pred = predict.bSpline2(fit1, newdata = test$cement, se=T)

--> Both of which tell me they cannot find the function predict.bSpline or predict.bSpline2, although I opened up both the splines and the splines2 libraries.

Also predict.ns doesn't seem to exist.

Any help would be greatly appreciated.

Upvotes: 1

Views: 1808

Answers (2)

MDEWITT
MDEWITT

Reputation: 2368

I think we would need to see a repoducible example of your data set because I think that might be causing your issue. When I use the following code I get the proper results:

library(dplyr)
library(splines)

train <- sample_frac(mtcars, .8)

test <- setdiff(mtcars, train)

fit1 <- lm(mpg~ns(wt, 4), data = train)

pred1 <- predict(fit1, newdata = test, se=T)

pred1

Just thinking about caret I would make sure that you are not creating a list object when you are creating your train/test splits (createDataPartition(list = FALSE))

Additionally, I would not use attach. IT is a little more typing to write out all the variable names, but you can sometimes get odd results which might be what is happening to you. I would restart your R session, remove the attach and then try again.

Not the best solution, but I would give it a go.

Upvotes: 1

Krzysztof Nowicki
Krzysztof Nowicki

Reputation: 100

I don't have your data but you should try passing whole dataframe test:

pred1 <- predict(fit1, newdata = test, se=T)

also "%>%" operator is defined in "tidyr" library I believe.

I think that predict.ns predict.bSpline override predict function - look up it's documentation. Although I've never used spline objects what I've read suggests that you just use object type from this library with normal "predict" function.

Upvotes: 1

Related Questions