pgg08927
pgg08927

Reputation: 175

How to use calculated Threshold Value in Logistic Regression?

I calculated best threshold value as 0.61 for the highest accuracy using below code in Python:

# probability
y_pred_prob = tv_lr.predict_proba(tv_x_test_vector)  

# fpr, tpr, threshold
fpr, tpr, threshold = roc_curve(y_test, y_pred_prob[:,1])

# accuracy score for threshold
accuracy_ls = []
for thresh in threshold:
    y_pred = np.where(y_pred_prob[:,1]>thresh, 1, 0)
    accuracy_ls.append(accuracy_score(y_test, y_pred))

# Dataframe
acc_thr_df = pd.concat([pd.Series(threshold), pd.Series(accuracy_ls)], axis=1, )
acc_thr_df.columns = ['thresh', 'acc']
acc_thr_df.sort_values(by='acc', ascending=False) # Chose the 1st value

When I use tv_lr.predict(tv_x_test_vector) It is using 0.5 as threshold.

Please advise how to change the threshold value to 0.61? Is the code shown here correct to do this rather than using tv_lr.predict(tv_x_test_vector)?

y_pred = np.where(y_pred_prob[:,1]>0.61, 1, 0)

Upvotes: 1

Views: 223

Answers (1)

Nikhil Kumar
Nikhil Kumar

Reputation: 1232

The predict method for a LogisticRegression estimator doesn't let you pass threshold as an argument, allowing you to use only 0.5 as threshold. So, as you say, you would have to convert the probabilities into hard predictions yourself for a custom value of threshold.

Your code seems correct.

Upvotes: 1

Related Questions