racunen
racunen

Reputation: 23

How can I calculate F1-measure and ROC in multiclass classification problem in R?

I have this code for a multiclass classification problem:

data$Class = as.factor(data$Class)
levels(data$Class) <- make.names(levels(factor(data$Class)))
trainIndex <- createDataPartition(data$Class, p = 0.6, list = FALSE, times=1)
trainingSet <- data[ trainIndex,]
testingSet  <- data[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Class

testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Class

oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)    

oneRM_pred <- predict(oneRM, testing_x)
oneRM_pred

eval_model(oneRM_pred, testing_y)


AUC_oneRM_pred <- auc(roc(oneRM_pred,testing_y))
cat ("AUC=", oneRM_pred)

# Recall-Precision curve    
oneRM_prediction <- prediction(oneRM_pred, testing_y)
RP.perf <- performance(oneRM_prediction, "tpr", "fpr")

plot (RP.perf)

plot(roc(oneRM_pred,testing_y))

But code does not work, after this line:

oneRM_prediction <- prediction(oneRM_pred, testing_y)

I get this error:

Error in prediction(oneRM_pred, testing_y) : Format of predictions is invalid.

In addition, I don´t know how I can get easily the F1-measure.

Finally, a question, does it make sense to calculate AUC in a multi-class classification problem?

Upvotes: 1

Views: 1965

Answers (2)

racunen
racunen

Reputation: 23

If I use levels(oneRM_pred) <- levels(testing_y) in this way:

...
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)    

oneRM_pred <- predict(oneRM, testing_x)
levels(oneRM_pred) <- levels(testing_y)
...

The accuracy is very much lower than before. So, I am not sure if to enforce the same levels is a good solution.

Upvotes: 0

Thiago Procaci
Thiago Procaci

Reputation: 1523

Let's start from F1.

Assuming that you are using the iris dataset, first, we need to load everything, train the model and perform the predictions as you did.

library(datasets)
library(caret)
library(OneR)
library(pROC)

trainIndex <- createDataPartition(iris$Species, p = 0.6, list = FALSE, times=1)
trainingSet <- iris[ trainIndex,]
testingSet  <- iris[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Species

testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Species

oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM_pred <- predict(oneRM, testing_x)

Then, you should calculate the precision, recall, and F1 for each class.

cm <- as.matrix(confusionMatrix(oneRM_pred, testing_y))
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
diag = diag(cm)  # number of correctly classified instances per class 

precision = diag / colsums 
recall = diag / rowsums 
f1 = 2 * precision * recall / (precision + recall) 

print(" ************ Confusion Matrix ************")
print(cm)
print(" ************ Diag ************")
print(diag)
print(" ************ Precision/Recall/F1 ************")
print(data.frame(precision, recall, f1)) 

After that, you are able to find the macro F1.

macroPrecision = mean(precision)
macroRecall = mean(recall)
macroF1 = mean(f1)

print(" ************ Macro Precision/Recall/F1 ************")
print(data.frame(macroPrecision, macroRecall, macroF1)) 

To find the ROC (precisely the AUC), it best to use pROC library.

print(" ************ AUC ************")
roc.multi <- multiclass.roc(testing_y, as.numeric(oneRM_pred))
print(auc(roc.multi))

Hope that it helps you.

Find details on this link for F1 and this for AUC.

Upvotes: 0

Related Questions