Reputation:
I see one example in old-mid exam from well-known person Tom Mitchell
, as follows:
Consider learning a classifier in a situation with 1000 features total. 50 of them are truly informative about class. Another 50 features are direct copies of the first 50 features. The final 900 features are not informative. Assume there is enough data to reliably assess how useful features are, and the feature selection methods are using good thresholds.
How many features will be selected by mutual information filtering?
Solution: 100
How many features will be selected by a wrapper method?
solution: 50
My challenge is how these solution is achieved? I do lots of try, but couldn't understand the idea behind this.
Upvotes: 6
Views: 833
Reputation: 159
How many features will be selected by mutual information filtering?
If we go by the question description, we should only have 50 features selected. But this filtering is based on correlation with the variable to predict. And, also one the major drawbacks of Mutual Information Filtering is, they tend to select redundant variables because they does not consider the relationships between variables.
How many features will be selected by a wrapper method?
Consider it as a Heuristic Search approach of space of all possible feature subsets. By definition, "A wrapper method evaluates a subset of features thus it takes the interactions between features into account."
Example: Hill Climbing , i.e. keep adding features one at a time until no further improvement can be achieved.
Since we have 50 feature which have the most information, other 50 a copy of the former and 900 feature are or no use. Therefore, we get only 50 features.
Upvotes: -1
Reputation: 16104
How many features will be selected by mutual information filtering?
Mutual information feature-selection evaluates the candidacy of each feature independently. Since there are essentially 100 features that are truly informative, we will ended up with 100 features by mutual information filtering.
How many features will be selected by a wrapper method?
A wrapper method evaluates a subset of features thus it takes the interactions between features into account. Since 50 features are direct copies of the other 50 features, the wrapper method is able to find out that conditioned on the first 50 features, the second set of 50 features is not adding any extra information at all. We ended up with 50 features after filtering. Suppose the first set of 50 features are A1, A2, ..., A50
and the copy of the 50 features are C1, C2, ..., C50
. The final result of selected features might look like:
A1, C2, A3, A4, C5, C6, ..., A48, A49, C50.
Thus each unique feature should have only one occurrence (either from the feature set of A
or from the feature set of C
).
Upvotes: 7