Reputation: 189
I'm doing some exploratory work on various toolkits for text classification. I had a question around determining the accuracy of a result of from a text classifier.
Using the 20 newsgroups demo as an example: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
Lets assume i have trained my data and have my test data ready to check against the trained classifier. If i pass a single text phrase against my classifier and it return a correct result (e.g. 'God is love' => soc.religion.christian ) How can i tell what the accuracy of that result is?
I also note in the subsequent section that by using the following command I can check the mean predicted accuracy. Again how can i determine accuracy for a single one off test?
np.mean(predicted == twenty_test.target)
0.912...
As aside, I note that when using Watson Conversation Classifier API calls in python (link below), the API returns a confidence score. Is there something comparable i can implement in scikit-learn?
https://www.ibm.com/watson/developercloud/conversation/api/v1/?python#send_message
"intent": "turn_on",
"confidence": 0.99
}
],
"output": {
"log_messages": [],
"text": [
"Ok. Turning on the light."
Thanks Jonathan
Upvotes: 0
Views: 486
Reputation: 17015
You need to use predict_proba
. This will give you a confidence score between 0 and 1.
model.fit(X, y)
preds = model.predict_proba(single_test_sample)
Upvotes: 2