Kshitij Bajracharya
Kshitij Bajracharya

Reputation: 851

Training a model when using Naive Bayes

I have a movie review dataset and I want to perform sentiment analysis on it.

I have implemented this using logistic regression. Following are the steps that I took in the process:

  1. Removed stop words and punctuation from each row in the dataset.
  2. Split the data into train, validation and test set.
  3. Created a vocabulary of words from the training set.
  4. Added every word in the vocabulary as a feature. If this word is in the current row, its TF-IDF value is set as the value of the feature, else 0 is set as the value.
  5. Train the model. During training, sigmoid function is used for calculating the hypothesis and cross entropy loss is used for cost function. Then using gradient descent, the weights of the model were updated.
  6. Tune hyperparameters using validation set
  7. Evaluate model using test set

Now, I need to implement the same thing using Naive Bayes and I'm confused as to how to approach this problem. I assume the first 4 steps are going to be the same. But what is the training step when using Naive Bayes? What is the loss function and cost function in this case? And where do I use the Bayes' theorem to calculate the conditional probability? And how do I update the weights and biases?

I've searched a lot of resources on the web and I've mostly only found implementations using sklearn with model.fit and model.predict and I'm having a hard time figuring out the math behind this and how it could be implemented using vanilla python.

Upvotes: 0

Views: 1197

Answers (1)

Kalsi
Kalsi

Reputation: 583

In the case of Logistic regression or SVM, the model is trying to predict the hyperplane which best fits the data. And so these models will determine the weights and biases.

  1. Naive Bayes is moreover a probabilistic approach. It completely depends on Bayes' theorem.

  2. There will be NO weights and biases in NB, there will only be CLASS WISE probability values for each of the features (i.e, words in case of text).

  3. To avoid zero probabilities or to handle the case of unseen data (words in case of text), use Laplace Smoothing.

  4. α is called the smoothening factor. And this will be hyperparameter in NB

  5. Use log for numeric stability.


  • Test example: This movie is great

  • After removing the stopwords: movie great

  • From the training data, we already know prob value for the words movie and great both for +ve & -ve class. Refer STEP 2.

  • Prob of great for +ve class would be greater than the prob of great for -ve class. And for the word movie, prob values could be almost the same. (This highly depends on your training data. Here I am just making an assumption)

positive class prob = P(movie/+ve) * P(great/+ve)

negative class prob = P(movie/-ve) * P(great/-ve)

  • Compare the class prob values & return the one having high prob value.

P.S

If the number of words in the sentence is large in numbers, then the class value would become very very small. Using log would solve this problem.

If the word great wasn't there in the training set, the class prob value would be 0. So use smoothening factor-α (Laplace smoothing)

Refer sk-learn naive bayes for more detailed info

Upvotes: 0

Related Questions