Hossein
Hossein

Reputation: 41841

Feature Selection in MATLAB

I have a dataset for text classification ready to be used in MATLAB. Each document is a vector in this dataset and the dimensionality of this vector is extremely high. In these cases peopl usually do some feature selection on the vectors like the ones that you have actually find the WEKA toolkit. Is there anything like that in MATLAB? if not can u suggest and algorithm for me to do it...? thanks

Upvotes: 6

Views: 19651

Answers (3)

Amro
Amro

Reputation: 124563

MATLAB (and its toolboxes) include a number of functions that deal with feature selection:

You can also find examples that demonstrates usage on real datasets:

In addition, there exist third-party toolboxes:

Otherwise you can always call your favorite functions from WEKA directly from MATLAB since it include a JVM...

Upvotes: 12

Will Dwinnell
Will Dwinnell

Reputation: 11

You might consider using the independent features technique of Weiss and Kulikowski to quickly eliminate variables which are obviously unimformative:

http://matlabdatamining.blogspot.com/2006/12/feature-selection-phase-1-eliminate.html

Upvotes: 1

Prasanna
Prasanna

Reputation: 105

Feature selection depends on the specific task you want to do on the text data.

One of the simplest and crudest method is to use Principal component analysis (PCA) to reduce the dimensions of the data. This reduced dimensional data can be used directly as features for classification.

See the tutorial on using PCA here:

http://matlabdatamining.blogspot.com/2010/02/principal-components-analysis.html

Here is the link to Matlab PCA command help:

http://www.mathworks.com/help/toolbox/stats/princomp.html

Using the obtained features, the well known Support Vector Machines (SVM) can be used for classification.

http://www.mathworks.com/help/toolbox/bioinfo/ref/svmclassify.html http://www.autonlab.org/tutorials/svm.html

Upvotes: 1

Related Questions