Måns Axelsson
Måns Axelsson

Reputation: 31

Different result roc_auc_score and plot_roc_curve

I am training a RandomForestClassifier (sklearn) to predict credit card fraud. When I then test the model and check the rocauc score i get different values when I use roc_auc_score and plot_roc_curve. roc_auc_score gives me around 0.89 and the plot_curve calculates AUC to 0.96 why is that?

The labels are all 0 and 1 as well as the predictions are 0 or 1. CodE:

clf = RandomForestClassifier(random_state =42)
clf.fit(X_train, y_train[target].values)
pred_test = clf.predict(X_test)
print(roc_auc_score(y_test, pred_test))
clf_disp = plot_roc_curve(clf, X_test, y_test)
plt.show()

Output of the code (the roc_auc_Score is just above the graph).

image

Upvotes: 3

Views: 2465

Answers (2)

Venkatachalam
Venkatachalam

Reputation: 16966

You are feeding the prediction classes instead of prediction probabilities to roc_auc_score.

From Documentation:

y_score: array-like of shape (n_samples,) or (n_samples, n_classes)

Target scores. In the binary and multilabel cases, these can be either probability estimates or non-thresholded decision values (as returned by decision_function on some classifiers).

change your code to:


clf = RandomForestClassifier(random_state =42)
clf.fit(X_train, y_train[target].values)
y_score = clf.predict_prob(X_test)
print(roc_auc_score(y_test, y_score[:, 1]))

Upvotes: 3

Vatsal Gupta
Vatsal Gupta

Reputation: 511

The ROC Curve and the roc_auc_score take the prediction probabilities as input, but as I can see from your code you are providing the prediction labels. You need to fix that.

Upvotes: 1

Related Questions