naz
naz

Reputation: 2072

Alternative to numpy's linalg.eig?

I have written a simple PCA code that calculates the covariance matrix and then uses linalg.eig on that covariance matrix to find the principal components. When I use scikit's PCA for three principal components I get almost the equivalent result. My PCA function outputs the third column of transformed data with flipped signs to what scikit's PCA function does. Now I think there is a higher probability that scikit's built-in PCA is correct than to assume that my code is correct. I have noticed that the third principal component/eigenvector has flipped signs in my case. So if scikit's third eigenvector is (a,-b,-c,-d) then mine is (-a,b,c,d). I might a bit shabby in my linear algebra, but I assume those are different results. The way I arrive at my eigenvectors is by computing the eigenvectors and eigenvalues of the covariance matrix using linalg.eig. I would gladly try to find eigenvectors by hand, but doing that for a 4x4 matrix (I am using iris data set) is not fun.

Iris data set has 4 dimensions, so at most I can run PCA for 4 components. When I run for one component, the results are equivalent. When I run for 2, also equivalent. For three, as I said, my function outputs flipped signs in the third column. When I run for four, again signs are flipped in the third column and all other columns are fine. I am afraid I cannot provide the code for this. This is a project, kind of.

Upvotes: 0

Views: 743

Answers (2)

lejlot
lejlot

Reputation: 66795

This is desired behaviour, even stated in the documentation of sklearn's PCA

Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.

and quite obviously correct from mathematical perspective, as if v is eigenvector of A then

Av = kv

thus also

A(-v) = -(Av) = -(kv) = k(-v)

Upvotes: 1

user2357112
user2357112

Reputation: 280390

So if scikit's third eigenvector is (a,-b,-c,-d) then mine is (-a,b,c,d).

That's completely normal. If v is an eigenvector of a matrix, then -v is an eigenvector with the same eigenvalue.

Upvotes: 1

Related Questions