Mathilde
Mathilde

Reputation: 39

How to choose the number of components PCA scikitliear

I'm trying to decompse my columns using PCA .

I'm finding some difficulties about how to choose my n_components of the function PCA using scikit learn in python. I did this

sc = StandardScaler()
Z = sc.fit_transform(X)
pca = PCA(n_components = 5')

Can you explain me please .

Upvotes: 4

Views: 14263

Answers (1)

Farseer
Farseer

Reputation: 4172

There is no answer that will tell you with probability 1 what is correct number of components. It is application specific.

However there is a following heuristic that you can use. You plot explained variance ratio and choose a number of components that "capture" at least 95% of the variance. In following example the number of components that capture around 95% of the variance is around 30.

pca = PCA().fit(digits.data)
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

enter image description here

Upvotes: 13

Related Questions