Reputation: 35
I have created a logistic regression model that doesn't perform very well. However, I still calculated the best threshold based on the highest accuracy score. Now, I would like to use that threshold which is 0.04 to calculate the precision and recall. Unfortunately, I cannot find any example on how to determine these values. Could you please help if you know the function I need to use?
Upvotes: 3
Views: 5834
Reputation: 153
In order to do what you want I start by predicting my probabilities with my model , then I convert my array of probabilities to an array of true/false (0/1) values using the threshold I want and then I compute the metric I want by comparing my array of predicted values to the true values.
for example :
# import precision and recall function from scikit-learn learn
from sklearn.metrics import precision_score, recall_score
# compute the probabilities
y_pred_prob = model.predict_proba(features)[:, 1]
# for a threshold of 0.5
precision0_5 = precision_score(true_labels, y_pred_prob > 0.5)
recall0_5 = recall_score(true_labels, y_pred_prob > 0.5)
# for a threshold of 0.04 (in your case)
precision0_04 = precision_score(true_labels, y_pred_prob > 0.04)
recall0_04 = recall_score(true_labels, y_pred_prob > 0.04)
Upvotes: 5
Reputation: 1139
You can use precision_score
and recall_score
from sci-kit to calculate precision and recall. The threshold that you specified is not a prerequisite argument to these functions. Below I also included the accuracy_score and confusion_matrix, since generally these go together for evaluation of a classifier's results.
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
import pandas as pd
def my_classifier_results(model, x_test, y_test):
y_true = y_test
y_pred = model.predict(x_test)
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred, average="weighted")
sensitivity = recall_score(y_true, y_pred, average="weighted")
print(f"Accuracy: {accuracy}, precision: {round(precision,4)}, sensitivity: {round(sensitivity,4)}\n")
cmtx = pd.DataFrame(
confusion_matrix(y_true, y_pred, labels=[1,0]),
index=['true:bad', 'true:good'],
columns=['pred:bad','pred:good']
)
print(f"{cmtx}\n")
Example output:
Upvotes: -2