Dimension reduction Using PCA while preserving variance in percentage

Question

i am trying to reduce the dimensions of MNIST dataset using PCA. Trick is, i have to preserve the certain percentage of variance(say 80%) while reducing the dimension. I am using Scikit learn. I am doing pca.get_variance ratio but it gives me same values with different dot location like 9.7 or .97 or .097. i am also tried pca.get_variance() but i assume that's not the answer. My question is how to ensure that i have reduce the dimension with certain variance percentage preserve?

bogatron · Accepted Answer

If you apply PCA without passing the n_components argument, then the explained_variance_ratio_ attribute of the PCA object will give you the information you need. This attribute indicates the fraction of total variance associated with the corresponding eigenvector. Here is an example copied directly from the current stable PCA documentation:

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(copy=True, n_components=2, whiten=False)
>>> print(pca.explained_variance_ratio_) 
[ 0.99244...  0.00755...]

In your case, if you apply np.cumsum to the explained_variance_ratio_ attribute, then the number of principal components you need to keep corresponds to the position of the first element in np.cumsum(pca.explained_variance_ratio_) that is greater than or equal to 0.8.

Dimension reduction Using PCA while preserving variance in percentage

Answers (1)

Related Questions