George Foster
George Foster

Reputation: 131

Natural language classifier returns classifications for untrained items

I am confused as to how NLC works. My expectation is that when it is asked to classify text that it should have no relation or training data to learn from it should return no results or results with very low confidence scores.

I have trained a model with a set of training data and when I attempt to classify text that is outside of the training data I am getting results with high confidence values (~60%).

Here's an example of my training data:

foo,1,2,3,4
bar,1,2,3,4
baz,1,2,3,4

When I try to classify the text "This should not exist" I receive a high confidence that this text is "1".

Is my assumption correct in that I should be returned values in this case? Am I training the data to classify foo, bar, and baz incorrectly? If not what should I expect from the NLC service?

Upvotes: 0

Views: 205

Answers (1)

German Attanasio
German Attanasio

Reputation: 23673

Imagine that you have 3 buckets and you have to throw a coin in one of them. Each bucket has 33.3 % changes to get the coin. The same happens with Natural Language Classifier service. It is trained to classify input text into predefined classes.

If you create a classifier with 3 classes and you try to classify text that wasn't in the training data, NLC will still classify your sentence to one of the three classes you defined. If your output is 60% then the other two buckets will get the remaining 40%.

Sometimes you could get a high score and that's normal when you have classes that are very different.

Upvotes: 0

Related Questions