Reputation:
Ive come across an issue using Naive Bayes on Document classification into various classes problem.
Actually I was wondering that P(C) or the prior probability of classes that we have at our hands initially will keep on changing over the course of time. For instance for classes - [music, sports, news] initial probabilities are [.25, .25, .50]
Now suppose that over the time during a certain month if we had a deluge of sports related documents (eg 80% sports ) then our NaiveBayes will fail as it will be based on a prior probability factor which says only 25% are sports. How do we deal with such a situation ?
Upvotes: 1
Views: 224
Reputation: 66835
If you know that priors change, you should refit them periodically (through gathering new training set representable for a new priors). In general - every ML method will fail in terms of accuracy if the priors change and you will not give this information to your classifier. You need at least some kind of feedback for the classifier. Then, if you for example have a closed loop where you get info if the classification is right or not, and you assume that only priors change - you can simply learn changing priors online (through any optimization, as it is rather easy to fit new priors).
In general you should look at concept drift phenomen.
Upvotes: 1