Reputation: 123
I am attempting to plot the precision-recall curve and measure the area under the curve of a caret cross-validated train object. Simply calling the object name yields some values of what the area under the precision recall curve is, like so:
> rf
Random Forest
807 samples
11 predictor
2 classes: 'X0', 'X1'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 727, 726, 727, 726, 727, 726, ...
Resampling results across tuning parameters:
mtry splitrule AUC Precision Recall F
2 gini 0.8179379 0.8618888 0.6713675 0.7494214
2 extratrees 0.8061601 0.8960233 0.5725071 0.6901257
7 gini 0.7798593 0.8775955 0.8037037 0.8360293
7 extratrees 0.8004585 0.8587664 0.7696581 0.8090205
12 gini 0.7659204 0.8578710 0.8229345 0.8364962
12 extratrees 0.7840497 0.8498209 0.7925926 0.8167108
Tuning parameter 'min.node.size' was held constant at a value of 1
AUC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 2, splitrule = gini and min.node.size = 1.
However, when I try to graph an actual curve using yardstick, I get completely different results.
prRf <- pr_curve(rf$pred, X0, truth = obs)
ggplot() +
geom_path(aes(x = recall, y = precision), colour = "blue", linetype = 1, data = prRf) +
xlab("Recall") +
ylab("Precision") +
theme_minimal() +
ylim(0,1)
pr_auc(rf$pred, X0, truth = obs)
Here, the curve looks vastly "better" and the AUPR is higher compared to internal one given by caret (0.878 vs. 0.817). The same holds true for a simple run of MLeval, which gives similarly "better" results.
evalm(rf)
All of this is confusing me quite a bit, and I feel like I may be testing within-sample somehow but I am unsure how to do it correctly without splitting the data beforehand.
Upvotes: 0
Views: 44