Reputation: 81
I'm really puzzled, hopefully someone can show me what I am missing. I'm trying to get principal components via two different methods:
import numpy as np
data = np.array([[ 2.1250045 , -0.17169867, -0.47799957],
[ 0.7400025 , -0.07970344, -0.99600106],
[ 0.15800177, 1.2993019 , -0.8030003 ],
[ 0.3159989 , 1.919297 , 0.24300112],
[-0.14800562, -1.0827019 , -0.2890004 ],
[ 0.26900184, -1.3816979 , 1.1239979 ],
[-0.5040008 , -2.9066994 , 1.6400006 ],
[-1.2230027 , -2.415702 , 3.1940014 ],
[-0.54700005, 1.757302 , -1.825999 ],
[-1.1860001 , 3.0623024 , -1.8090007 ]]) # this should already be mean centered
# Method 1. Scikit-Learn
from sklearn.decomposition import PCA
pca = PCA(n_components=3).fit(data)
print(pca.components_)
[[-0.04209988 -0.79261507 0.60826717]
[ 0.88594009 -0.31106375 -0.34401963]
[ 0.46188501 0.52440508 0.71530521]]
# Method 2. Manually with numpy
cov = np.cov(data.T)
evals , evecs = np.linalg.eig(cov)
# The next three lines are just sorting by the largest eigenvalue
idx = np.argsort(evals)[::-1]
evecs = evecs[:,idx]
evals = evals[idx]
print(evecs.T)
[[ 0.04209988 0.79261507 -0.60826717]
[ 0.88594009 -0.31106375 -0.34401963]
[-0.46188501 -0.52440508 -0.71530521]]
The values for the eigenvectors are the same, but the signs are wrong. What I want is to get the output from sklearn PCA, but using only numpy. Thanks in advance for any suggestions.
Upvotes: 2
Views: 818
Reputation: 7211
That is expected because the eigenspace of a matrix (covariance matrix in your question) is unique but the specific set of eigenvectors is not. It is too much to explain here, so I would recommend the answer in math.se
PS: Notice that you're dealing with covariance matrix of 3x3 and you can imagine the eigenvectors as vectors in 3D with x-, y-, z-axis. Then you should notice your numpy answer vs sklearn answer are in exact opposite direction for 2 vectors and same direction for 1 vector.
Upvotes: 1