Reputation: 1

How to get the prominent word in a spam - non spam classifier?

Suppose i have a spam-non spam email classifier. If a new email has been classified as a spam mail, how to determine the words in the mail mainly responsible for the classifier to classify it as SPAM.

For example, if a mail has the following text :

Get 10000 dollars free by clicking here.

The main words responsible for classifying the mail as SPAM are "10000 dollars free".

Upvotes: -1

Answers (2)

Aditya

Reputation: 1135

I'm going to answer this question assuming that you have used the Naive Bayes classifier for classification.

Naive Bayes classifier is a rather simple algorithm that has been successfully employed in the field of spam detection.

The naive bayes classifier is based on Conditional Probability and makes use of the following equation:

P (a|b) = P (b|a) * P (a) / P (b)

Suppose that there are two classes that a Naive Bayes classifier can classify a piece of text (email) into, spam and not spam.

The equation mentioned above applied to the task of spam detection can be translated as follows:

P (class | text) = P (text | class) * P (class) / P (text)

Since the text is made up of words, it can be represented as a combination of words. text -> w1, w2, ....., wn

This translates to,

P (class | w1, w2, ..., wn) = P (w1, w2, ..., wn | class) * P (class) / P (w1, w2, ..., wn)

Since the Naive Bayes classifier makes the Naive assumption that the words are conditionally independent of each other, this translates to:

P (class | w1, w2, ... , wn) = P (w1 | class) * P (w2 | class) * ... * P (wn | class) * P (class)

For all the classes ('spam' and 'not spam' in our example).

I dropped down the denominator since it will be common for all the probabilities.

Where, P (class) is the probability of a given class ('spam' and 'not spam'). Suppose, that you have 100 training examples of which 60 are spam and 40 are not spam, then the class probabilities of 'spam' and 'not spam' would be 0.6 and 0.4 respectively.

P (w | class) is the probability of a word given a class. In the naive bayes classifier you count the probability of each word in a given class.

Let's consider the example that you have given,

Get 10000 dollars free by clicking here.

The naive bayes classifier would have already calculated the probabilities of the words Get, dollars, free, by, clicking, here in your sentence in a given class (spam and not spam).

If the sentence was classified as spam, then you can find the words which contributed most to the sentence being spam by finding out their probabilities in both spam and not spam classes.

Here you can find a Simple Naive Bayes implementation applied to the task of spam detection in emails.

Upvotes: 1

lejlot

Reputation: 66850

This fully depends on your model. However, I will give you a general, mathematical way, and then few practical solutions

Mathematical solution

Let us assume that your classifier is probabilistic in this sense that it provides you with supports of its decision (this includes neural networks, naive bayes, lda, logistic regression etc.)

f(x) = P(ham|x)

Then if you want to answer "which dimension (feature) in x alters my decision the most" all you have to do is analyze the gradient (gradient, being a vector of partial derivatives shows you which dimensions affect output the most), thus:

most_important_feature_if_it_is_classified_as_ham = arg max_i (grad_x[f])_i

and symmetricaly if it is spam then

most_important_feature_if_it_is_classified_as_spam = arg min_i (grad_x[f])_i

All you need is the ability to differentiate your model. This again is possible for many existing ones like neural nets, naive bayes, lda or logistic regression.

Practical solutions.

I list few more or less direct methods of computing the above for typical models

linear models (linear SVM, logistic regression, etc.) - you can simply look at your weights vector and take arg max/ arg min value (as this is the exact value of the gradient).
Random forest - here you cannot differentiate as you do not have a nice , continuous, support function; but you can use internal features_importance_ measure (available in scikit-learn's implementation) which simply says how many times given feature was used to make a classification in the whole training set.
other "black box" methods - you can use many approximation schemes to do feature importance analysis. In particular, you can easily estimate the gradient itself, simply, for each feature (word) iterate over your whole training set (or reasonable subset) - set this feature to 0 (and later to 1) and check how many classifications changed - this will give you a rough estimate of the importance of this feature (independently on the others).

Upvotes: 1

How to get the prominent word in a spam - non spam classifier?

Answers (2)

Mathematical solution

Practical solutions.

Related Questions