Reputation: 2171
I would like to get a confidence score of each of the predictions that it makes, showing on how sure the classifier is on its prediction that it is correct.
I want something like this:
How sure is the classifier on its prediction?
Class 1: 81% that this is class 1
Class 2: 10%
Class 3: 6%
Class 4: 3%
Samples of my code:
features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(main, target, test_size = 0.4)
# Determine amount of time to train
t0 = time()
model = SVC()
#model = SVC(kernel='poly')
#model = GaussianNB()
model.fit(features_train, labels_train)
print 'training time: ', round(time()-t0, 3), 's'
# Determine amount of time to predict
t1 = time()
pred = model.predict(features_test)
print 'predicting time: ', round(time()-t1, 3), 's'
accuracy = accuracy_score(labels_test, pred)
print 'Confusion Matrix: '
print confusion_matrix(labels_test, pred)
# Accuracy in the 0.9333, 9.6667, 1.0 range
print accuracy
model.predict(sub_main)
# Determine amount of time to predict
t1 = time()
pred = model.predict(sub_main)
print 'predicting time: ', round(time()-t1, 3), 's'
print ''
print 'Prediction: '
print pred
I suspect that I would use the score() function, but I seem to keep implementing it correctly. I don't know if that's the right function or not, but how would one get the confidence percentage of a classifier's prediction?
Upvotes: 37
Views: 55134
Reputation: 1
using above code you will get 4 class names with predicted value for each sample. You can change no_of_class for as many as you need.
probas1 =model.predict_proba(sub_main)
no_of_class=4
top3_classes1 = np.argsort(-probas1, axis=1)[:, :no_of_class]
class_labels1 = rf.classes_[top3_classes1[i]] for i in range(len(top3_classes1))]
class_labels1
top_confidence1=[probas1[i][top3_classes1[i]] for i in range(len(top_classes1))]
for i in range(len(class_labels1)):
for j in range(no_of_class):
print(f"Sample {i}: {class_labels1[i][j]} :: {top_confidence1[i][j]}")
NOTE: you can simply also convert this into dataframe where you can add column of predicted class and in another column its predicted value
Upvotes: 0
Reputation: 24742
For those estimators implementing predict_proba()
method, like Justin Peel suggested, You can just use predict_proba()
to produce probability on your prediction.
For those estimators which do not implement predict_proba()
method, you can construct confidence interval by yourself using bootstrap concept (repeatedly calculate your point estimates in many sub-samples).
Let me know if you need any detailed examples to demonstrate either of these two cases.
Upvotes: 16
Reputation: 47072
Per the SVC documentation, it looks like you need to change how you construct the SVC:
model = SVC(probability=True)
and then use the predict_proba method:
class_probabilities = model.predict_proba(sub_main)
Upvotes: 33