jeff cheung
jeff cheung

Reputation: 11

sklearn pca's n_component equals to the number of features problem

When I don't set the n_components parameter, the number of components kept equals the number of features of the dataframe.

If n_components isn't set, then the transformed dataframe should be the same, but it turns out that it is not.

Why is the transformed dataframe different from the original dataframe?

import pandas as pd
pca = PCA(random_state=seed)
pd1 = pd.DataFrame([[1,1,1],[2,2,2],[3,3,3]])
pca.fit(pd1)
print(pd1)
print(pca.transform(pd1))

the output is:

0  1  2
0  1  1  1
1  2  2  2
2  3  3  3
[[-1.73205081e+00 -1.11022302e-16  0.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00]
 [ 1.73205081e+00  1.11022302e-16  0.00000000e+00]]

Upvotes: 1

Views: 264

Answers (1)

parti82
parti82

Reputation: 161

The documentation in sklearn pca page says

n_components == min(n_samples, n_features)

So that is the reason why your result has 3 components.

And then PCA will just do its thing by converting your data to those 3 principal axes where the variance is maximized (and orthogonal).

To get a more mathematical explanation on what PCA does, please check other sources like PCA wikipedia

Upvotes: 3

Related Questions