Reputation: 87
I have been playing around with sklearn PCA and it is behaving oddly.
from sklearn.decomposition import PCA
import numpy as np
identity = np.identity(10)
pca = PCA(n_components=10)
augmented_identity = pca.fit_transform(identity)
np.linalg.norm(identity - augmented_identity)
4.5997749080745738
Note that I set the number of dimensions to be 10. Shouldn't the norm be 0?
Any insight into why it is not would be appreciated.
Upvotes: 3
Views: 2264
Reputation: 24752
Although PCA computes the orthogonal components based on covariance matrix, the input to PCA in sklearn is the data matrix instead of covairance/correlation matrix.
import numpy as np
from sklearn.decomposition import PCA
# gaussian random variable, 10-dimension, identity cov mat
X = np.random.randn(100000, 10)
pca = PCA(n_components=10)
X_transformed = pca.fit_transform(X)
np.linalg.norm(np.cov(X.T) - np.cov(X_transformed.T))
Out[219]: 0.044691263454134933
Upvotes: 4