lbcommer
lbcommer

Reputation: 1035

Caret and custom validation set with indexOut produces strange result

Caret lets you set a custom training and validation set in train with the options index and indexOut , but when the obtained model is applied over the validation set, and its performance measured, this is very different to the provided by the model itself:

library(caret)
library(Metrics)

set.seed(123)
index_on <- 1:16
index_out <- 17:32
fit <- train(mpg~wt+qsec,
             mtcars,
             method = "glm", 
             metric = "RMSE",
             trControl = trainControl(method="cv", 
                                      index = list(index_on), 
                                      indexOut = list(index_out))
             ) 
fit$results$RMSE
rmse(mtcars[index_out, "mpg"], predict(fit, mtcars[index_out,])) 

As you can see this produces different values for the performance when it is obtained from the train object or calculated with predict directly:

[1] 3.612743

[1] 3.079445

Is this a bug? am I missing something here?

Upvotes: 3

Views: 597

Answers (1)

lbcommer
lbcommer

Reputation: 1035

I have been investigating and it looks like that internally train calculates the right expected model and calculates the performance with that model, but it returns a different model instead. It return one that is the obtained training ALL the data (not only the "index" data).

You can see that with this code:

set.seed(123)
fit_3 <- train(mpg~wt+qsec,
             data=mtcars,
             method = "glm", 
             metric = "RMSE",
             trControl = trainControl(method="none")
) 

rmse(mtcars[index_out, "mpg"], predict(fit_3, mtcars[index_out,]))

which produces:

[1] 3.079445

I'm using the last current caret version (caret_6.0-75 at the moment). It was pretty clear that this is a but and I was going to report it when I found it is an open bug already:

https://github.com/topepo/caret/issues/348

Upvotes: 1

Related Questions