Reputation: 33303
I want to find the following probability:
P(y=1/n=k; thetha)
Read as:
Probability, The prediction is class 1 given number of words = k, parametrized by thetha
A traditional classification doesn't have the conditional probability (right)
P(y = 1; thetha)
How do I solve this?
EDIT:
For example, lets say I want to predict whether an email is spam or not based on the number of attachments.
Let y=1
indicate spam and y=0
be non-spam.
So,
P(y = 1/num_attachements=0; some attributes)
and so on!!
Is it making any sense?
Upvotes: 0
Views: 415
Reputation: 28552
Normally number of attachments is just another attribute, so your probability is the same as
P(y = 1 | all attributes)
However, if you have some special treatment of attachment (say, other attributes are numeric and attachment is boolean) you can compute them separately and then combine as:
P(C|A, B) = P(C|A) * P(C|B) / P(C)
where C
stands for event y = 1
, A
- for attachments and B
for other attributes.
See this paper for description of several Nave Bayes classifiers.
Upvotes: 1
Reputation: 12901
Use a Naive Baisean classifier. You can code one yourself quite quickly or use/look at the nltk library.
Upvotes: 1