Reputation: 498
i am trying to reduce the dimensions of MNIST dataset using PCA. Trick is, i have to preserve the certain percentage of variance(say 80%) while reducing the dimension. I am using Scikit learn. I am doing pca.get_variance ratio but it gives me same values with different dot location like 9.7 or .97 or .097. i am also tried pca.get_variance() but i assume that's not the answer. My question is how to ensure that i have reduce the dimension with certain variance percentage preserve?
Upvotes: 2
Views: 1971
Reputation: 19169
If you apply PCA without passing the n_components
argument, then the explained_variance_ratio_
attribute of the PCA object will give you the information you need. This attribute indicates the fraction of total variance associated with the corresponding eigenvector. Here is an example copied directly from the current stable PCA documentation:
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print(pca.explained_variance_ratio_)
[ 0.99244... 0.00755...]
In your case, if you apply np.cumsum
to the explained_variance_ratio_
attribute, then the number of principal components you need to keep corresponds to the position of the first element in np.cumsum(pca.explained_variance_ratio_)
that is greater than or equal to 0.8.
Upvotes: 1