Reputation: 33
In caret package, there is a thing called trainControl that allow us to perform variety of cross validation. To perform 10-fold cross-validation, one would use
fitControl <- trainControl(method= "repeatedcv", number = 10, repeats = 10)
fitJ48_10_fold <- train(x = x, y =y, method = "J48", trControl= fitControl)
while for training set, it is
fitControl <- trainControl(method= "none")
fitJ48train <- train(x = x, y =y, method = "J48", trControl= fitControl)
However, confusion matrix of these model show the same for both 10-fold and training.
Activity <- predict(fitJ48_10_fold, newdata = Train)
confusionMatrix(Activity, Train$Activity)
Activity <- predict(fitJ48train, newdata = Train)
confusionMatrix(Activity, Train$Activity)
I used the weka classifier GUI and indeed the performance of J48 from 10-fold cross validation is lower than that of training set. Am I wrong to suspect that the trainControl from caret isn't working or I pass this in a wrong way?
Upvotes: 2
Views: 11534
Reputation: 14331
Am I wrong to suspect that the trainControl from caret isn't working or I pass this in a wrong way?
A little. For J48
, there is a tuning parameter but the default grid only fits a single value C = 0.25
. The final model will be the same no matter what value of method
that you use in trainControl
so the confusion matrices will always be the same.
Max
Upvotes: 1