Reputation: 59
I have tried different matlab functions like plotroc and packages in R like pROC, ROCR and cvAUC. Each package or function produces different graph and gives different AUC than Weka result.
I would like to compare multiple classifier using 10-fold-cross-validation and would like to pot ROC of each. I have collected results in Weka but I don't want to plot it in Weka itself.
My experiments are based on both binary class and multi-class data.
My Weka output cross-validated instance predictions are at https://drive.google.com/folderview?id=0BzRIG3kN-8Z9fnh5OElKTExNT2NuZUVna2tKcmMzU1VBankwdVc2OGxBSXFnaFJqSEhHYVE&usp=sharing
Please, suggest me how can I plot graph for attached results for binaryclass as well as multiclass.
Upvotes: 3
Views: 1577
Reputation: 59
I didn't find exact solution for the issue. However, here are some points that I observe from Weka output
Apart from this I didn't understand how Weka calculates the threshold values form the predictions. I observe that with same Weka prediction output I found different threshold values in Weka and R (and Matlab).
Finally, I used Weka API code for plotting ROC Generate ROC Curve and extract the TPR and FPR for the experiments (I re-run all the experiments). After extracting TPR and FPR I can plot graphs in any tool like Excel, gnuplot, Matlab or R.
Upvotes: 1
Reputation: 10984
This is a placeholder answer, but the first thing to note is that one your observations got cross-validated less than 10 times:
library(pROC)
library(dplyr)
filenameROC = "Data/term3_IBk_3_multiclass.txt"
fileROC = readLines(filenameROC)
dfCV = read.csv2(text = fileROC,
nrows = length(fileROC) - 51 - 19,
header = TRUE,
sep = ",",
skip = 19, stringsAsFactors = FALSE)
dfCV %>%
group_by(inst.) %>%
tally() %>%
filter(n < 10)
Which gives:
> dfCV %>%
+ group_by(inst.) %>%
+ tally() %>%
+ filter( n < 10)
Source: local data frame [1 x 2]
inst. n
1 773 4
Can you explain this?
Additionally, you also need to add a cross-validation iteration identifier. Once you do that it is simply a question of running multiclass.roc
from the pROC
package by CV iteration.
OP claims that there are 7724 *observations` whereas it is easy to see that there are 773 observations repeated 10 times in 772 cases and 4 times for observation number 772 -- consistent with 10-fold cross-validation data:
> dfCV %>%
+ group_by(inst.) %>%
+ tally()
Source: local data frame [773 x 2]
inst. n
1 1 10
2 2 10
3 3 10
4 4 10
5 5 10
6 6 10
7 7 10
8 8 10
9 9 10
10 10 10
.. ... ..
Here is the code to produce the multi-class ROC by CV fold:
dfCVROC = dfCV %>%
dplyr::filter(inst. != 773) %>%
arrange(inst.) %>%
dplyr::mutate(cvfold = rep.int(1:10, 772)) %>%
group_by(cvfold) %>%
do(multiclass_roc = multiclass.roc(as.factor(.$actual), as.numeric(.$prediction)))
# get the AUCs by CV fold
sapply(dfCVROC$multiclass_roc, function(x) x$auc)
Upvotes: 2