B.Gees
B.Gees

Reputation: 1165

Is pROC area under the curve incorrect?

My current problem : I used caret package to generate classification prediction models and I meant to validate my models with specific metric (AUC ROC). AUC metric is available to train model with training set (internal validation) but NOT to predict (external validation).

1. Internal validation :

Fit <- train(X, Y$levels, method= "svmRadial", trControl = fitControl, tuneLength = 20, metric = "ROC")

Results :

sigma C ROC Sens Spec ROCSD SensSD SpecSD 0.0068 2.00 0.83 0.82 0.57 0.149 0.166 0.270

2. External Validation :

In order to access to external validation AUC, I tried to predict my training set and calculate directly this metric with pROC.

predictions <- as.vector(predict(Fit$finalModel, newdata = X)) data <- data.frame(pred=as.numeric(predictions),obs=as.numeric(Y$levels)) pROC::roc(data$pred, data$obs)

Results : Area under the curve: 0.9057

3. Conclusion :

Results : AUC(internal validation) != AUC(external validation) whereas I used same data (training set) to check my ROC external validation criterion. In the best case, I should be able to obtain a maximum value of 0.83. However, it would seem very odd to me that AUC(internal validation) < AUC(external validation).

I have no idea to solve this enigma (8-/ skeptical) . All assistance is welcome.

Upvotes: 1

Views: 459

Answers (1)

Mike Wise
Mike Wise

Reputation: 22847

So your results are to be expected. In general the "internally validated" AUCs are created by using test cases that were separate from the training cases whereas in your "external validation" you are testing with the same cases you trained on (which is cheating of course). So the internally validated AUC will be expected to be smaller than the externally validated AUC. I think the following diagram should make that clear:

enter image description here

Upvotes: 0

Related Questions