Eli
Eli

Reputation: 35

Calculating the precision and recall for a specific threshold

I have created a logistic regression model that doesn't perform very well. However, I still calculated the best threshold based on the highest accuracy score. Now, I would like to use that threshold which is 0.04 to calculate the precision and recall. Unfortunately, I cannot find any example on how to determine these values. Could you please help if you know the function I need to use?

Upvotes: 3

Views: 5834

Answers (2)

Enzo Ramirez C.
Enzo Ramirez C.

Reputation: 153

In order to do what you want I start by predicting my probabilities with my model , then I convert my array of probabilities to an array of true/false (0/1) values using the threshold I want and then I compute the metric I want by comparing my array of predicted values to the true values.

for example :

# import precision and recall function from scikit-learn learn
from sklearn.metrics import precision_score, recall_score

# compute the probabilities
y_pred_prob = model.predict_proba(features)[:, 1]

# for a threshold of 0.5
precision0_5 = precision_score(true_labels, y_pred_prob > 0.5)
recall0_5 = recall_score(true_labels, y_pred_prob > 0.5)

# for a threshold of 0.04 (in your case)
precision0_04 = precision_score(true_labels, y_pred_prob > 0.04)
recall0_04 = recall_score(true_labels, y_pred_prob > 0.04)

Upvotes: 5

DSH
DSH

Reputation: 1139

You can use precision_score and recall_score from sci-kit to calculate precision and recall. The threshold that you specified is not a prerequisite argument to these functions. Below I also included the accuracy_score and confusion_matrix, since generally these go together for evaluation of a classifier's results.

from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
import pandas as pd

def my_classifier_results(model, x_test, y_test):
    y_true = y_test
    y_pred = model.predict(x_test)    
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average="weighted")
    sensitivity = recall_score(y_true, y_pred, average="weighted")    
    print(f"Accuracy: {accuracy}, precision: {round(precision,4)}, sensitivity: {round(sensitivity,4)}\n")
    cmtx = pd.DataFrame(
        confusion_matrix(y_true, y_pred, labels=[1,0]), 
        index=['true:bad', 'true:good'], 
        columns=['pred:bad','pred:good']
    )
    print(f"{cmtx}\n")

Example output:

enter image description here

Upvotes: -2

Related Questions