Reputation: 23
I have this code for a multiclass classification problem:
data$Class = as.factor(data$Class)
levels(data$Class) <- make.names(levels(factor(data$Class)))
trainIndex <- createDataPartition(data$Class, p = 0.6, list = FALSE, times=1)
trainingSet <- data[ trainIndex,]
testingSet <- data[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Class
testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Class
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM_pred <- predict(oneRM, testing_x)
eval_model(oneRM_pred, testing_y)
AUC_oneRM_pred <- auc(roc(oneRM_pred,testing_y))
cat ("AUC=", oneRM_pred)
# Recall-Precision curve
oneRM_prediction <- prediction(oneRM_pred, testing_y)
RP.perf <- performance(oneRM_prediction, "tpr", "fpr")
plot (RP.perf)
But code does not work, after this line:
oneRM_prediction <- prediction(oneRM_pred, testing_y)
I get this error:
Error in prediction(oneRM_pred, testing_y) : Format of predictions is invalid.
In addition, I don´t know how I can get easily the F1-measure.
Finally, a question, does it make sense to calculate AUC in a multi-class classification problem?
Upvotes: 1
Views: 1965
Reputation: 23
If I use levels(oneRM_pred) <- levels(testing_y) in this way:
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM_pred <- predict(oneRM, testing_x)
levels(oneRM_pred) <- levels(testing_y)
The accuracy is very much lower than before. So, I am not sure if to enforce the same levels is a good solution.
Upvotes: 0
Reputation: 1523
Let's start from F1.
Assuming that you are using the iris dataset, first, we need to load everything, train the model and perform the predictions as you did.
trainIndex <- createDataPartition(iris$Species, p = 0.6, list = FALSE, times=1)
trainingSet <- iris[ trainIndex,]
testingSet <- iris[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Species
testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Species
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM_pred <- predict(oneRM, testing_x)
Then, you should calculate the precision, recall, and F1 for each class.
cm <- as.matrix(confusionMatrix(oneRM_pred, testing_y))
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
diag = diag(cm) # number of correctly classified instances per class
precision = diag / colsums
recall = diag / rowsums
f1 = 2 * precision * recall / (precision + recall)
print(" ************ Confusion Matrix ************")
print(" ************ Diag ************")
print(" ************ Precision/Recall/F1 ************")
print(data.frame(precision, recall, f1))
After that, you are able to find the macro F1.
macroPrecision = mean(precision)
macroRecall = mean(recall)
macroF1 = mean(f1)
print(" ************ Macro Precision/Recall/F1 ************")
print(data.frame(macroPrecision, macroRecall, macroF1))
To find the ROC (precisely the AUC), it best to use pROC
print(" ************ AUC ************")
roc.multi <- multiclass.roc(testing_y, as.numeric(oneRM_pred))
Hope that it helps you.
Find details on this link for F1 and this for AUC.
Upvotes: 0