pooja
pooja

Reputation: 73

feature selection

I have document-term data with terms as dimensions. I have to perform feature selection on the terms and I intend to use Mutual Information as the measure to perform feature selection. My doubt here is that after calculating the mutual information between all possible pairs what is to be done? Should I set a threshold and select all the terms of the pairs that fall within the threshold?

Upvotes: 0

Views: 609

Answers (1)

kamaci
kamaci

Reputation: 75257

If you want to use mutual information you can consider to use mRMR algrorithm. You can select features with such kind of algorithms. What I mean:

You have n features at your data set (it means n dimensions)

If you want to use most meaningful

k of n (k < n)

You can use feature selection (i.e. with mRMR that uses mutual information background)

Deciding on k depends on some situations.

  • One of them is you don't want to use unnecessary features at your model creation.

  • Other thing is you want to aviod calculation cost and remove some features from your data set

You should test your algorithm after you removed some features. You examine that does accuracy goes up and depending on your aim even accuracy goes down does it resulting with avoiding from calculation cost(so you may want to eleminate some features too)

On the other hand I suggest you to look at feature extraction methods i.e. PCA and also LDA (especially for your case).

Upvotes: 2

Related Questions