Concept of Naive Bayes for demonstration purposes, how to calculate word possibilities

Question

I need to demonstrate the Bayesian spam filter in school.

To do this I want to write a small Java application with GUI (this isn't a problem).

I just want to make sure that I really grasped the concept of the filter, before starting to write my code. So i will describe what I am going to build and how I will program it and would be really grateful if you could give a "thumbs up" or "thumbs down".

Remember: It is for a short presentation, just to demonstrate. It does not have to be performant or something else ;)

I imagine the program having 2 textareas.

In the first I want to enter a text, for example

"The quick brown fox jumps over the lazy dog"

I then want to have two buttons under this field with "good" or "bad".

When I hit one of the buttons, the program counts the appearances of each word in each section.

So for example, when I enter following texts:

"hello you viagra" | bad

"hello how are you" | good

"hello drugs viagra" | bad

For words I do not know I assume a probability of 0.5

My "database" then looks like this:

, <# times word appeared in bad message>

hello, 2
you, 1
viagra, 2
how, 0
are, 0
drugs, 1

In the second textarea I then want to enter a text to evaluate if it is "good" or "bad".

So for example:

"hello how is the viagra selling"

The algorithm then takes the whole text apart and looks up for every word it's probability to appear in a "bad" message.

This is now where I'm stuck:

If I calculate the probability of a word to appear in a bad message by # times it appeared in bad messages / # times it appeared in all messages, the above text would have 0 probability to be in any category, because:

how never appeared in a bad message, so probability is 0
viagra never appeared in a good message, so probability also 0

When I now multiply the single probabilities, this would give 0 in both cases.

Could you please explain, how I calculate the probability for a single word to be "good" or "bad"?

Best regards and many thanks in advance me

Concept of Naive Bayes for demonstration purposes, how to calculate word possibilities

Answers (1)

Related Questions