syed jameer
syed jameer

Reputation: 454

Predicted class along with its corresponding probability

I have built a machine learning model using maxvoting(Decision tree, Random Forest, Logistic Regression) classifier. For which i have have the input as

{ "Salary": 50000, "Current loans": 15000, "Credit Score": 616, "Requested Loan": 25000 }

When i pass this data to my model. It is giving the prediction as

{"Status": Approve}

But i need to retrieve the response like

{"Status": Approve, "Accuracy": 0.87}

Any help would be much appreciated

Upvotes: 1

Views: 845

Answers (1)

yatu
yatu

Reputation: 88305

It looks like you're probably using sklearn's VotingClassifier. Once you've fitted the classifier, you see the probabilities associated with each class through the attribute predict_proba. Note that rather than an accuracy, this is really the associated probability of each class. So if you want the probability of a test sample being of class n, you'll have to index the output y_pred_prob on the corresponding column. Here's an example using sklearn's iris dataset:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, VotingClassifier

from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB

clf1 = LogisticRegression(multi_class='multinomial', random_state=1)
clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
clf3 = GaussianNB()

X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y)

eclf2 = VotingClassifier(estimators=[
        ('lr', clf1), ('rf', clf2), ('gnb', clf3)],
        voting='soft')

eclf2 = eclf2.fit(X_train, y_train)

And we could get the probabilities associated to the first class for instance with:

eclf2.predict_proba(X_test)[:,0].round(2)

array([0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.01, 0.01, 0.  , 0.  , 0.  ,
       0.99, 0.  , 0.99, 0.99, 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
       0.  , 0.01, 0.98, 0.  , 1.  , 0.99, 0.  , 0.  , 0.  , 0.99, 0.98,
       0.  , 0.99, 0.  , 0.01, 0.99])

Finally, to get an output as you've described, you can use the result returned by predict, to index the 2D probability array as follows:

import pandas as pd

y_pred = eclf2.predict(X_test)
y_pred_prob = eclf2.predict_proba(X_test).round(2)
associated_prob = y_pred_prob[np.arange(len(y_test)), y_pred]
pd.DataFrame({'class':y_pred, 'Accuracy':associated_prob})

    class  Accuracy
0       0      0.99
1       2      0.84
2       2      1.00
3       1      0.95
4       2      0.99
5       2      0.91
6       1      0.98
7       1      0.98
8       1      0.93

Or if you prefer the output as a dictionary:

pd.DataFrame({'class':y_pred, 'Accuracy':associated_prob}).to_dict(orient='index')

 {0: {'class': 0, 'Accuracy': 0.99},
  1: {'class': 2, 'Accuracy': 0.84},
  2: {'class': 2, 'Accuracy': 1.0},
  3: {'class': 1, 'Accuracy': 0.95},
  4: {'class': 2, 'Accuracy': 0.99},

Upvotes: 3

Related Questions