Salman Baqri
Salman Baqri

Reputation: 109

ROC_AUC_SCORE is different while calculating using predict() vs predict_proba() in Random Forest

Both predict() vs predict_proba() gives different roc_auc_score in Random Forest.

I understand that predict_proba() gives probabilities such as in case of Binary Classification it will gives two probabilities corresponding both classes. predict() gives class it predicted.

    #Using predict_proba()
    rf = RandomForestClassifier(n_estimators=200, random_state=39)
    rf.fit(X_train[['Cabin_mapped', 'Sex']], y_train)

    #make predictions on train and test set
    pred_train = rf.predict_proba(X_train[['Cabin_mapped', 'Sex']])
    pred_test = rf.predict_proba(X_test[['Cabin_mapped', 'Sex']].fillna(0))

    print('Train set')
    print('Random Forests using predict roc-auc: {}'.format(roc_auc_score (y_train, pred_train)))

    print('Test set')
    print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_test, pred_test)))

   #using predict()

   pred_train = rf.predict(X_train[['Cabin_reduced', 'Sex']])
   pred_test = rf.predict(X_test[['Cabin_reduced', 'Sex']])

   print('Train set')
   print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_train, pred_train)))
   print('Test set')
   print('Random Forests using predict roc-auc: {}'.format(roc_auc_score(y_test, pred_test)))

Train set Random Forests using predict_proba roc-auc: 0.8199550985878832

Test set Random Forests using preditc_proba roc-auc: 0.8332142857142857

Train set Random Forests using predict roc-auc: 0.7779440793041364

Test set Random Forests using predict roc-auc: 0.7686904761904761

Upvotes: 3

Views: 6218

Answers (2)

jrreda
jrreda

Reputation: 814

Predict returns the predicted value.

Predict_proba returns the probabilities of the predicted values.

Upvotes: 0

Jindřich
Jindřich

Reputation: 11213

As you said, the predict function returns the prediction as True/False value, whereas proba function returns probabilities, values between one and zero and this is the reason for the difference.

AUC means "area under the curve" which is indeed different if the curve is a 0/1 step function or a curve made of continuous values.

Let's imagine you have only one example, it should be classified as False. If your classifier yields the probability of 0.7, the ROC-AUC value is 1.0-0.7=0.3. If you used predict, the prediction will be True = 1.0, so the ROC-AUC will be 1.0-1.0=0.0.

Upvotes: 7

Related Questions