Reputation: 21
I am trying to generate a ROC curve for data that is highly imbalanced and multiclass (I know this is not ideal, it is requested by a reviewer for the paper). SKlearn have an option for this here: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
The specific code I am using is this:
RocCurveDisplay.from_predictions(
y_onehot_test.ravel(),
y_score.ravel(),
name="micro-average OvR",
color="darkorange",
plot_chance_level=True,
)
plt.axis("square")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Micro-averaged One-vs-Rest\nReceiver Operating Characteristic")
plt.legend()
plt.show()
I am confused about the averaging: The title includes the information that we use "micro averaged ovr", but where do I actually give this information to the function?
y_onehot_test looks like this: 1 1 1 0 0 ...
and y_score looks like this: 0.783307 0.832748 0.619186 0.645178 0.654100 ...
Thanks for any insights and explanations :)
Upvotes: 0
Views: 197
Reputation: 21
If anyone in the future has this same question - the answer is in understanding better micro-averaging. Micro-average gives each sample equal weight, thus there is no need for class information in this case. If you do want to give different weights by class size, weighted averaging is needed.
Upvotes: 0