Reputation: 41
I have to learn information gain for feature selection right now, But I don't have clear comprehension about it. I am a newbie, and I'm confused about it.
How to use IG in feature selection (manual calculation)?
I just have clue this .. That have anyone can help me how to use the formula:
then this is the example:
Upvotes: 4
Views: 6331
Reputation: 680
The formula comes from mutual information, in this case, you can think of mutual information as how much information the presence of the term t gives us for guessing the class.
Check: https://nlp.stanford.edu/IR-book/html/htmledition/mutual-information-1.html
Upvotes: 0
Reputation: 37741
How to use information gain in feature selection?
Information gain (InfoGain(t)
) measures the number of bits of information obtained for prediction of a class (c) by knowing the presence or absence of a term (t) in a document.
Concisely, the information gain is a measure of the reduction in entropy of the class variable after the value for the feature is observed. In other words, information gain for classification is a measure of how common a feature is in a particular class compared to how common it is in all other classes.
In text classification, feature means the terms appeared in documents (a.k.a corpus). Consider, two terms in the corpus - term1
and term2
. If term1
is reducing entropy of the class variable by a larger value than term2
, then term1
is more useful than term2
for document classification in this example.
Example in the context of sentiment classification
A word that occurs primarily in positive movie reviews and rarely in negative reviews contains high information. For example, the presence of the word “magnificent” in a movie review is a strong indicator that the review is positive. That makes “magnificent” a high informative word.
Compute entropy and information gain in python
Upvotes: 2