Reputation: 39
I'm trying to decompse my columns using PCA .
I'm finding some difficulties about how to choose my n_components of the function PCA using scikit learn in python. I did this
sc = StandardScaler()
Z = sc.fit_transform(X)
pca = PCA(n_components = 5')
Can you explain me please .
Upvotes: 4
Views: 14263
Reputation: 4172
There is no answer that will tell you with probability 1 what is correct number of components. It is application specific.
However there is a following heuristic that you can use. You plot explained variance ratio and choose a number of components that "capture" at least 95% of the variance. In following example the number of components that capture around 95% of the variance is around 30.
pca = PCA().fit(digits.data)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')
Upvotes: 13