Reputation: 2072
I have written a simple PCA code that calculates the covariance matrix and then uses linalg.eig
on that covariance matrix to find the principal components. When I use scikit's PCA for three principal components I get almost the equivalent result. My PCA function outputs the third column of transformed data with flipped signs to what scikit's PCA function does. Now I think there is a higher probability that scikit's built-in PCA is correct than to assume that my code is correct. I have noticed that the third principal component/eigenvector has flipped signs in my case. So if scikit's third eigenvector is (a,-b,-c,-d)
then mine is (-a,b,c,d)
. I might a bit shabby in my linear algebra, but I assume those are different results. The way I arrive at my eigenvectors is by computing the eigenvectors and eigenvalues of the covariance matrix using linalg.eig
. I would gladly try to find eigenvectors by hand, but doing that for a 4x4
matrix (I am using iris data set) is not fun.
Iris data set has 4 dimensions, so at most I can run PCA for 4 components. When I run for one component, the results are equivalent. When I run for 2, also equivalent. For three, as I said, my function outputs flipped signs in the third column. When I run for four, again signs are flipped in the third column and all other columns are fine. I am afraid I cannot provide the code for this. This is a project, kind of.
Upvotes: 0
Views: 743
Reputation: 66795
This is desired behaviour, even stated in the documentation of sklearn's PCA
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
and quite obviously correct from mathematical perspective, as if v is eigenvector of A then
Av = kv
thus also
A(-v) = -(Av) = -(kv) = k(-v)
Upvotes: 1
Reputation: 280390
So if scikit's third eigenvector is
(a,-b,-c,-d)
then mine is(-a,b,c,d)
.
That's completely normal. If v
is an eigenvector of a matrix, then -v
is an eigenvector with the same eigenvalue.
Upvotes: 1