Reputation: 175
I calculated best threshold value as 0.61 for the highest accuracy using below code in Python:
# probability
y_pred_prob = tv_lr.predict_proba(tv_x_test_vector)
# fpr, tpr, threshold
fpr, tpr, threshold = roc_curve(y_test, y_pred_prob[:,1])
# accuracy score for threshold
accuracy_ls = []
for thresh in threshold:
y_pred = np.where(y_pred_prob[:,1]>thresh, 1, 0)
accuracy_ls.append(accuracy_score(y_test, y_pred))
# Dataframe
acc_thr_df = pd.concat([pd.Series(threshold), pd.Series(accuracy_ls)], axis=1, )
acc_thr_df.columns = ['thresh', 'acc']
acc_thr_df.sort_values(by='acc', ascending=False) # Chose the 1st value
When I use tv_lr.predict(tv_x_test_vector)
It is using 0.5 as threshold.
Please advise how to change the threshold value to 0.61? Is the code shown here correct to do this rather than using tv_lr.predict(tv_x_test_vector)
?
y_pred = np.where(y_pred_prob[:,1]>0.61, 1, 0)
Upvotes: 1
Views: 223
Reputation: 1232
The predict
method for a LogisticRegression
estimator doesn't let you pass threshold as an argument, allowing you to use only 0.5 as threshold. So, as you say, you would have to convert the probabilities into hard predictions yourself for a custom value of threshold.
Your code seems correct.
Upvotes: 1