Rachelle
Rachelle

Reputation: 267

Bayesian Network Output

I'm using a dataset that predicts whether one has diabetes or not. If in my data set, the number of observations negative of diabetes is 10 times larger than those of positive, is it already given that my bayesian would only learn and predict negatives because it has more observations than the other?

Upvotes: 0

Views: 196

Answers (1)

Zhubarb
Zhubarb

Reputation: 11905

Let's say your prior outcome probabilities are: P(not_diabetic) = 0.9 and P(diabetic) = 0.1.

This is an example of an imbalanced training set and would have a detrimental effect on the learner's behaviour. Classifying the cases that have P(diabetic)>0.5 as Diabetic and the rest as Non_diabetic would not give good results in your case.

When you validate your classifier, you need to use a method that takes into consideration the effect of the imbalanced priors of your training set on your posterior probabilities, such as the Bayesian information Reward.

You can have a look at this paper for a general discussion on the effects of imbalanced training sets on Bayesian classifiers..

Upvotes: 1

Related Questions