yayu
yayu

Reputation: 8088

Basic concepts: Naive Bayes algorithm for classification

I think I understand Naive Bayes more or less, but I have a few questions regarding its implementation for a simple binary text classification tast.

Let's say that document D_i is some subset of the vocabulary x_1, x_2, ...x_n

There are two classes c_i any document can fall on, and I want to compute P(c_i|D) for some input document D which is proportional to P(D|c_i)P(c_i)

I have three questions

  1. P(c_i) is #docs in c_i/ #total docs or #words in c_i/ #total words
  2. Should P(x_j|c_i) be the #times x_j appears in D/ #times x_j appears in c_i
  3. Suppose an x_j doesn't exist in the training set, do I give it a probability of 1 so that it doesn't alter the calculations?

For example, let us say that I have a training set of one:

training = [("hello world", "good")
            ("bye world", "bad")]

so the classes would have

good_class = {"hello": 1, "world": 1}
bad_class = {"bye":1, "world:1"}
all = {"hello": 1, "world": 2, "bye":1}

so now if I want to compute probability of a test string being good

test1 = ["hello", "again"]
p_good = sum(good_class.values())/sum(all.values())
p_hello_good = good_class["hello"]/all["hello"]
p_again_good = 1 # because "again" doesn't exist in our training set

p_test1_good = p_good * p_hello_good * p_again_good

Upvotes: 0

Views: 201

Answers (1)

Devavrata
Devavrata

Reputation: 1785

As this question is too broad so I can only answer in a limiting way:-

1st:- P(c_i) is #docs in c_i/ #total docs or #words in c_i/ #total words

P(c_i) = #c_i/#total docs

2nd:- Should P(x_j|c_i) be the #times x_j appears in D/ #times x_j appears in c_i.
After @larsmans noticed..

It is exactly occurrence of word in a document
by total number of words in that class in whole dataset.

3rd:- Suppose an x_j doesn't exist in the training set, do I give it a probability of 1 so that it doesn't alter the calculations?

For That we have laplace correction or Additive smoothing. It is applied on
p(x_j|c_i)=(#times x_j appears in D+1)/ (#times x_j +|V|) which will neutralize
the effect not occurring features.

Upvotes: 1

Related Questions