scikit-learn get certainty of classification / score of the classifier for the chosen category

Question

I am doing some multiclass text classification and it work well for my needs:

classifier = Pipeline([
    ('vect', CountVectorizer(tokenizer=my_tokenizer, stop_words=stopWords, ngram_range=(1, 2), min_df=2)),
    ('tfidf', TfidfTransformer(norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=False)),
    ('clf', MultinomialNB(alpha=0.01, fit_prior=True))])

categories = [list of my possible categories]

# Learning

news = [list of news already categorized]
news_cat = [the category of the corresponding news]

news_target_cat = numpy.searchsorted(categories, news_cat)

classifier = classifier.fit(news, news_target_cat)

# Categorizing

news = [list of news not yet categorized]

predicted = classifier.predict(news)

for i, pred_cat in enumerate(predicted):
    print(news[i])
    print(categories[pred_cat])

Now, what i would like to have with the predicted category is its 'certainty' from the predictor (eg: 0.0 -> "I have rolled a dice to choose a category" up to 1.0 -> "Nothing will make change my mind about the category of that news"). How should I get that certainty value / the score of the predictor for that category?

solomkinmv · Accepted Answer

If you need something like probability of the category, you have to use predict_proba() method of the classifier.

Docs.

scikit-learn get certainty of classification / score of the classifier for the chosen category

Answers (1)

Related Questions