edgarmtze
edgarmtze

Reputation: 25048

state-of-the-art of dimensionality algorithms

We know there are algorithms to reduce the dimension of data sets like PCA and Isomap

Lets say we have a data set with 100,000 attributes like Dorothea Data Set (Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the NIPS 2003 feature selection challenge.)

Data Set Characteristics:   Multivariate

Number of Instances:        1950

Area:                       Life

Attribute Characteristics:  Integer

Number of Attributes:       100000

Date Donated                2008-02-29

Associated Tasks:           Classification

Missing Values?             N/A

Number of Web Hits:         17103

Upvotes: 2

Views: 498

Answers (2)

fairidox
fairidox

Reputation: 3428

Maximum Variance Unfolding is a particularly popular technique these days. A similar approach called Structure Preserving Embedding got best paper at ICML 2009. A few other techniques include Laplacian Eigenmaps, Locally Linear Embedding, and Kernel PCA.

Upvotes: 0

user656781
user656781

Reputation:

Specific to Matlab, you can take some ideas from the manual of their Statistics Toolbox.

Look for the Feature Selection and Feature Transformation sections. Also, I would try SVD, FastMap and RobustMap. You'll need to read a bit about each and decide which one is most suitable for your data.

Upvotes: 1

Related Questions