Reputation: 11
When I don't set the n_components
parameter, the number of components kept equals the number of features of the dataframe.
If n_components
isn't set, then the transformed dataframe should be the same, but it turns out that it is not.
Why is the transformed dataframe different from the original dataframe?
import pandas as pd
pca = PCA(random_state=seed)
pd1 = pd.DataFrame([[1,1,1],[2,2,2],[3,3,3]])
pca.fit(pd1)
print(pd1)
print(pca.transform(pd1))
the output is:
0 1 2
0 1 1 1
1 2 2 2
2 3 3 3
[[-1.73205081e+00 -1.11022302e-16 0.00000000e+00]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[ 1.73205081e+00 1.11022302e-16 0.00000000e+00]]
Upvotes: 1
Views: 264
Reputation: 161
The documentation in sklearn pca page says
n_components == min(n_samples, n_features)
So that is the reason why your result has 3 components.
And then PCA will just do its thing by converting your data to those 3 principal axes where the variance is maximized (and orthogonal).
To get a more mathematical explanation on what PCA does, please check other sources like PCA wikipedia
Upvotes: 3