Reputation: 3813
I'd like to use the Incremental principal components analysis (IPCA) to reduce my feature space such that it contains x% of information.
I would use the sklearn.decomposition.IncrementalPCA(n_components=None, whiten=False, copy=True, batch_size=None)
I can leave the n_components=None
so that it works on all the features that I have.
But later once the whole data set is analyzed.
How do I select the features that represent x% of data and how do I create a transform()
for those features number of features.
This idea taken from this question.
Upvotes: 1
Views: 213
Reputation: 36555
You can get the percentage of explained variance from each of the components of your PCA using explained_variance_ratio_
. For example in the iris dataset, the first 2 principal components account for 98% of the variance in the data:
import numpy as np
from sklearn import decomposition
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
pca = decomposition.IncrementalPCA()
pca.fit(X)
pca.explaned_variance_ratio_
#array([ 0.92461621, 0.05301557, 0.01718514, 0.00518309])
Upvotes: 1