Reputation: 25048
We know there are algorithms to reduce the dimension of data sets like PCA and Isomap
Lets say we have a data set with 100,000 attributes like Dorothea Data Set (Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the NIPS 2003 feature selection challenge.)
Data Set Characteristics: Multivariate
Number of Instances: 1950
Area: Life
Attribute Characteristics: Integer
Number of Attributes: 100000
Date Donated 2008-02-29
Associated Tasks: Classification
Missing Values? N/A
Number of Web Hits: 17103
Upvotes: 2
Views: 498
Reputation: 3428
Maximum Variance Unfolding is a particularly popular technique these days. A similar approach called Structure Preserving Embedding got best paper at ICML 2009. A few other techniques include Laplacian Eigenmaps, Locally Linear Embedding, and Kernel PCA.
Upvotes: 0
Reputation:
Specific to Matlab, you can take some ideas from the manual of their Statistics Toolbox.
Look for the Feature Selection and Feature Transformation sections. Also, I would try SVD, FastMap and RobustMap. You'll need to read a bit about each and decide which one is most suitable for your data.
Upvotes: 1