Reputation: 851
I have a movie review dataset and I want to perform sentiment analysis on it.
I have implemented this using logistic regression. Following are the steps that I took in the process:
Now, I need to implement the same thing using Naive Bayes and I'm confused as to how to approach this problem. I assume the first 4 steps are going to be the same. But what is the training step when using Naive Bayes? What is the loss function and cost function in this case? And where do I use the Bayes' theorem to calculate the conditional probability? And how do I update the weights and biases?
I've searched a lot of resources on the web and I've mostly only found implementations using sklearn with model.fit and model.predict and I'm having a hard time figuring out the math behind this and how it could be implemented using vanilla python.
Upvotes: 0
Views: 1197
Reputation: 583
In the case of Logistic regression or SVM, the model is trying to predict the hyperplane which best fits the data. And so these models will determine the weights and biases
.
Naive Bayes
is moreover a probabilistic approach. It completely depends on Bayes' theorem.
There will be NO
weights and biases in NB, there will only be CLASS WISE
probability values for each of the features (i.e, words
in case of text).
To avoid zero probabilities or to handle the case of unseen data (words
in case of text), use Laplace Smoothing.
α
is called the smoothening factor. And this will be hyperparameter in NB
Use log
for numeric stability.
Test example: This movie is great
After removing the stopwords: movie great
From the training data, we already know prob value for the words movie
and great
both for +ve
& -ve
class. Refer STEP 2
.
Prob of great
for +ve
class would be greater than the prob of great
for -ve
class. And for the word movie
, prob values could be almost the same. (This highly depends on your training data. Here I am just making an assumption)
positive class prob = P(
movie
/+ve) * P(great
/+ve)negative class prob = P(
movie
/-ve) * P(great
/-ve)
P.S
If the number of words in the sentence is large in numbers, then the class value would become very very small. Using
log
would solve this problem.If the word
great
wasn't there in the training set, the class prob value would be 0. So usesmoothening factor-α
(Laplace smoothing)
Refer sk-learn naive bayes for more detailed info
Upvotes: 0