rollotommasi
rollotommasi

Reputation: 489

Principal Component Analysis, how many components?

I don't understand one point of PCA. PCA returns the directions that maximizes the variance for each feature? I mean, it will return a component for each feature of our original space, and only the k biggest components will be used as axis for the new subspace right? So actually if I'm in 50-D and 49 features have a strong variance i can just pass to a 49-D space?

Upvotes: 5

Views: 5544

Answers (1)

stackoverflowuser2010
stackoverflowuser2010

Reputation: 40909

If your original data has 50 dimensions, then PCA will return 50 principal components. It is up to you to choose a subset k of those principal components that can explain the most variance, typically at least 90% of the variance. The PCA software you use will usually compute how much variance is explained by each principal component, so just add up the variance and select the top k that can get you to 90% of the total variance. See this PCA tutorial:

In general, we would like to choose the smallest K such that 0.85 to 0.99 (equivalently, 85% to 95%) of the total variance is explained, where these values follow from PCA best practices.

... When we say that PCA can reduce dimensionality, we mean that PCA can compute principal components and the user can choose the smallest number K of them that explain 0.95 of the variance. A subjectively satisfactory result would be when K is small relative to the original number of features D.

Upvotes: 7

Related Questions