Reputation: 389
I have written a simple multinomial Naive Bayes classifier in Python. The code predicts correct labels for BBC news dataset, but when I use a prior P(X) probability in denominator to output scores as probabilities, I get incorrect values (like > 1 for probability). Below I attach my code:
The entire process is based on this formula I learnt from the Wikipedia article about Naive Bayes:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english', min_df=5, ngram_range=(1,1) )
features = vectorizer.fit_transform(data.news).toarray()
print(features.shape)
(2225, 9138)
As a result, I get 9138 features for each article in the dataset.
I calculate pki as follows:
def count_word_probability(features):
V_size = features.shape[1]
alpha = 1
total_counts_for_each_word = np.sum(features,axis=0)
total_count_of_words = np.sum(total_counts_for_each_word)
probs = (alpha + total_counts_for_each_word) / ( (V_size * alpha) + total_count_of_words)
return probs
Basically, what this function does is computes the total frequency of each word in all articles with a particular label (e.g business) and divides by the total number of words in all articles with that label. It also applies Laplace smoothing (alpha = 1 ) to account for words with 0 frequency.
labels_probs = [ len(data.index[data['category_id'] == i ]) / len(data) for i in range(5)]
import math as math
from scipy.special import factorial
def scaling_term(doc):
term = math.factorial(np.sum(doc)) / np.prod(factorial(doc))
return term
Scaling function above divides the factorial of words sum in an article by the product of factorials.
def nb_constant (article, labels_probs, word_probs):
s_term = scaling_term(article)
evidence = [ np.log(s_term) + np.sum(article * np.log(word_probs[i])) + np.log(labels_probs[i]) for i in range(len(word_probs))]
evidence = np.sum(evidence)
return evidence
So, the last function above calculates the denominator (prior probability P(x). It sums ups P(x|Ck) of all article classes:
def naive_bayes(article, label_probs, words_probs):
class_probs = []
s_term = scaling_term(article)
constant_term = nb_constant(article, label_probs, words_probs)
for cl in range(len(label_probs)):
class_prob = ( np.log(s_term) + np.sum(article * np.log(words_probs[cl])) + np.log(label_probs[cl]) ) / constant_term
class_probs.append(class_prob)
class_probs = np.exp(np.array(class_probs))
return class_probs
Without a constant term, this function outputs correct label for any custom texts I feed to it. But the scores are all uniform and close to zero for all classes. When I divide by the constant term to get real probability values that sum up to zero I get weird results like 1.25 probability for all classes. I am definitely missing something in theory because I don't know much about probability theory and math. I would appreciate any help. Thanks.
Upvotes: 1
Views: 1381
Reputation: 389
Thanks to Robert Dodier I figured out what was the issue. Before dividing by the constant (evidence) make sure to exponentiate numerator logs back to probabilities. Also, make sure to do exponentiation for all classes in the evidence term before summing them up.
Upvotes: 1