Yige Song
Yige Song

Reputation: 373

Choose naive bayes model for continous feature, multiple labels

Imagining I have a data set, whose feature values are continuous, and there are more than two possible labels (eg: rain, sunny, windy etc), which naive bayes model should I implement in sklearn?

I am thinking about Gaussian or Multinomial. However, multinomial works for discrete features, and I tried gaussian, but it turns out that the accuracy of the prediction is like random selecting.

Upvotes: 0

Views: 567

Answers (2)

Merihan Daniel
Merihan Daniel

Reputation: 1

Usually when your data is continuous, you will apply Gaussian naive Bayes or you can transform your data into a discrete format where your temperature values are converted to (ex.low, medium, high).

The outcome of your Gaussian model should not be equating to random selection, there is probably something wrong with the model or the data.

Some things to check before you apply Gaussian model:

  1. Does your data follow a normal distribution? (you can either plot a histogram or run Shapiro-Wilk test)
  2. Are all of your classes independent from each other? (did you check for dependency? use spearman or Pearson test)

Upvotes: 0

Bohniti
Bohniti

Reputation: 81

Naive Bayes Classification (NBC) works with discrete values. That means you have to discretize all features which are continuous. For more details, this could help

Anyways, multinominal is correct because you have more than one label. But you should also keep in mind that you have to one-hot encode your labels (OneHotEncoder in sklearn).

Upvotes: 1

Related Questions