Arsenal Fanatic
Arsenal Fanatic

Reputation: 3813

How to choose the features that describe x% of all information in data while using Incremental principal components analysis (IPCA)?

I'd like to use the Incremental principal components analysis (IPCA) to reduce my feature space such that it contains x% of information.

I would use the sklearn.decomposition.IncrementalPCA(n_components=None, whiten=False, copy=True, batch_size=None) I can leave the n_components=None so that it works on all the features that I have.

But later once the whole data set is analyzed. How do I select the features that represent x% of data and how do I create a transform() for those features number of features.

This idea taken from this question.

Upvotes: 1

Views: 213

Answers (1)

maxymoo
maxymoo

Reputation: 36555

You can get the percentage of explained variance from each of the components of your PCA using explained_variance_ratio_. For example in the iris dataset, the first 2 principal components account for 98% of the variance in the data:

import numpy as np
from sklearn import decomposition
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
pca = decomposition.IncrementalPCA()
pca.fit(X)
pca.explaned_variance_ratio_

#array([ 0.92461621,  0.05301557,  0.01718514,  0.00518309])

Upvotes: 1

Related Questions