b.bhavesh
b.bhavesh

Reputation: 59

How to plot ROC curve for cross validation from Weka output for binary class and multiclass data?

I have tried different matlab functions like plotroc and packages in R like pROC, ROCR and cvAUC. Each package or function produces different graph and gives different AUC than Weka result.

I would like to compare multiple classifier using 10-fold-cross-validation and would like to pot ROC of each. I have collected results in Weka but I don't want to plot it in Weka itself.

My experiments are based on both binary class and multi-class data.

My Weka output cross-validated instance predictions are at https://drive.google.com/folderview?id=0BzRIG3kN-8Z9fnh5OElKTExNT2NuZUVna2tKcmMzU1VBankwdVc2OGxBSXFnaFJqSEhHYVE&usp=sharing

Please, suggest me how can I plot graph for attached results for binaryclass as well as multiclass.

Upvotes: 3

Views: 1577

Answers (2)

b.bhavesh
b.bhavesh

Reputation: 59

I didn't find exact solution for the issue. However, here are some points that I observe from Weka output

  1. While weka plots the ROC it get predictions directly from classifier evaluation output.
  2. Weka uses predictions upto 6 decimal points number for calculating threshold values (more precision helps in calculating more number of threshold values for ROC curve).
  3. By default in Weka explorer, classifier outputs prediction upto 3 decimal points only (as in my attached experiments results).

Apart from this I didn't understand how Weka calculates the threshold values form the predictions. I observe that with same Weka prediction output I found different threshold values in Weka and R (and Matlab).

Finally, I used Weka API code for plotting ROC Generate ROC Curve and extract the TPR and FPR for the experiments (I re-run all the experiments). After extracting TPR and FPR I can plot graphs in any tool like Excel, gnuplot, Matlab or R.

Upvotes: 1

tchakravarty
tchakravarty

Reputation: 10984

This is a placeholder answer, but the first thing to note is that one your observations got cross-validated less than 10 times:

library(pROC)
library(dplyr)

filenameROC = "Data/term3_IBk_3_multiclass.txt"
fileROC = readLines(filenameROC)
dfCV = read.csv2(text = fileROC,
                 nrows = length(fileROC) - 51 - 19,
                header = TRUE, 
                sep = ",",
                skip = 19, stringsAsFactors = FALSE)


dfCV %>%
  group_by(inst.) %>%
  tally() %>%
  filter(n < 10)

Which gives:

> dfCV %>%
+   group_by(inst.) %>%
+   tally() %>%
+   filter( n < 10)
Source: local data frame [1 x 2]

  inst. n
1   773 4

Can you explain this?

Additionally, you also need to add a cross-validation iteration identifier. Once you do that it is simply a question of running multiclass.roc from the pROC package by CV iteration.

Edit:

OP claims that there are 7724 *observations` whereas it is easy to see that there are 773 observations repeated 10 times in 772 cases and 4 times for observation number 772 -- consistent with 10-fold cross-validation data:

> dfCV %>%
+   group_by(inst.) %>%
+   tally()
Source: local data frame [773 x 2]

   inst.  n
1      1 10
2      2 10
3      3 10
4      4 10
5      5 10
6      6 10
7      7 10
8      8 10
9      9 10
10    10 10
..   ... ..

Edit 2:

Here is the code to produce the multi-class ROC by CV fold:

dfCVROC = dfCV %>%
  dplyr::filter(inst. != 773) %>%
  arrange(inst.) %>%
  dplyr::mutate(cvfold = rep.int(1:10, 772)) %>%
  group_by(cvfold) %>%
  do(multiclass_roc = multiclass.roc(as.factor(.$actual), as.numeric(.$prediction)))

# get the AUCs by CV fold
sapply(dfCVROC$multiclass_roc, function(x) x$auc)

Upvotes: 2

Related Questions