Jonathan Dunne
Jonathan Dunne

Reputation: 189

How do i check the accuracy of the result of a text classifier (scikit-learn)

I'm doing some exploratory work on various toolkits for text classification. I had a question around determining the accuracy of a result of from a text classifier.

Using the 20 newsgroups demo as an example: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

Lets assume i have trained my data and have my test data ready to check against the trained classifier. If i pass a single text phrase against my classifier and it return a correct result (e.g. 'God is love' => soc.religion.christian ) How can i tell what the accuracy of that result is?

I also note in the subsequent section that by using the following command I can check the mean predicted accuracy. Again how can i determine accuracy for a single one off test?

np.mean(predicted == twenty_test.target)            
0.912...

As aside, I note that when using Watson Conversation Classifier API calls in python (link below), the API returns a confidence score. Is there something comparable i can implement in scikit-learn?

https://www.ibm.com/watson/developercloud/conversation/api/v1/?python#send_message

"intent": "turn_on",
  "confidence": 0.99
}
  ],
 "output": {
"log_messages": [],
"text": [
  "Ok. Turning on the light."

Thanks Jonathan

Upvotes: 0

Views: 486

Answers (1)

Abhishek Thakur
Abhishek Thakur

Reputation: 17015

You need to use predict_proba. This will give you a confidence score between 0 and 1.

model.fit(X, y)
preds = model.predict_proba(single_test_sample)

Upvotes: 2

Related Questions