frazman
frazman

Reputation: 33303

Machine Learning - Classification algorithm

I want to find the following probability:

P(y=1/n=k; thetha) 

Read as:

Probability, The prediction is class 1 given number of words = k, parametrized by thetha

A traditional classification doesn't have the conditional probability (right)

P(y = 1; thetha) 

How do I solve this?

EDIT:

For example, lets say I want to predict whether an email is spam or not based on the number of attachments. Let y=1 indicate spam and y=0 be non-spam.

So,

P(y = 1/num_attachements=0; some attributes)
and so on!!

Is it making any sense?

Upvotes: 0

Views: 415

Answers (2)

ffriend
ffriend

Reputation: 28552

Normally number of attachments is just another attribute, so your probability is the same as

P(y = 1 | all attributes)

However, if you have some special treatment of attachment (say, other attributes are numeric and attachment is boolean) you can compute them separately and then combine as:

P(C|A, B) = P(C|A) * P(C|B) / P(C)

where C stands for event y = 1, A - for attachments and B for other attributes.

See this paper for description of several Nave Bayes classifiers.

Upvotes: 1

Matt Alcock
Matt Alcock

Reputation: 12901

Use a Naive Baisean classifier. You can code one yourself quite quickly or use/look at the nltk library.

Upvotes: 1

Related Questions