Reputation: 75
I am using Logistic regression algorithm for multi-class text classification. I need a way to get the confidence score along with the category. For eg - If I pass text = "Hello this is sample text" to the model, I should get predicted class = Class A and confidence = 80% as a result.
Upvotes: 0
Views: 13334
Reputation: 88236
For most models in scikit-learn, we can get the probability estimates for the classes through predict_proba
. Bear in mind that this is the actual output of the logistic function, the resulting classification is obtained by selecting the output with highest probability, i.e. an argmax
is applied on the output. If we see the implementation here, you can see that it is essentially doing:
def predict(self, X):
# decision func on input array
scores = self.decision_function(X)
# column indices of max values per row
indices = scores.argmax(axis=1)
# index class array using indices
return self.classes_[indices]
In the case of calling predict_proba
rather than predict
, scores
is returned. Here's an example use case training a LogisticRegression
:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
lr= LogisticRegression()
lr.fit(X_train, y_train)
y_pred_prob = lr.predict_proba(X_test)
y_pred_prob
array([[1.06906558e-02, 9.02308167e-01, 8.70011771e-02],
[2.57953117e-06, 7.88832490e-03, 9.92109096e-01],
[2.66690975e-05, 6.73454730e-02, 9.32627858e-01],
[9.88612145e-01, 1.13878133e-02, 4.12714660e-08],
...
And we can obtain the probabilities by taking the argmax
, as mentioned, and index the array of classes as:
classes = load_iris().target_names
classes[indices]
array(['virginica', 'virginica', 'versicolor', 'virginica', 'setosa',
'versicolor', 'versicolor', 'setosa', 'virginica', 'setosa',...
So for a single prediction, through the predicted probabilities we could easily do something like:
y_pred_prob = lr.predict_proba(X_test[0,None])
ix = y_pred_prob.argmax(1).item()
print(f'predicted class = {classes[ix]} and confidence = {y_pred_prob[0,ix]:.2%}')
# predicted class = virginica and confidence = 90.75%
Upvotes: 3