First two Principal components are reversed

Question

I am trying to compute PCA of images using this code by Alyssa in python.

# Read the image into palmFingers_mask:  
# we use PCA and otain 2 principal components
y, x = np.nonzero(palmFingers_mask)
# Subtract mean from each dimension
x = x - np.mean(x)
y = y - np.mean(y)
coords = np.vstack([x, y]) 
# Covariance matrix and its eigenvectors and eigenvalues
cov = np.cov(coords)
evals[:,frm_cnt], evecs = np.linalg.eig(cov)
# Sort eigenvalues in decreasing order
sort_indices = np.argsort(evals[:,frm_cnt])[::-1]
evec1[:,frm_cnt], evec2[:,frm_cnt] = evecs[:, sort_indices]
x_v1, y_v1 = evec1[:,frm_cnt]  # Eigenvector with largest eigenvalue
x_v2, y_v2 = evec2[:,frm_cnt]
# Plot the principal components
scale = 20
plt.figure(1)
plt.plot(x, y, 'y.')
plt.plot([x_v1*-scale*2, x_v1*scale*2],
                 [y_v1*-scale*2, y_v1*scale*2], color='red')
plt.plot([x_v2*-scale, x_v2*scale],
                 [y_v2*-scale, y_v2*scale], color='blue')
plt.axis('equal')
plt.gca().invert_yaxis()  # Match the image system with origin at top left
plt.show()

I mark the two axes with red and blue lines (red is the larger component, length of lines does not signify anything).

While doing so, I came across one faulty case, shown below:

How can it be possible? This depicts that the red eigen vector is larger, but intuitively, it should not be the case. The axes (along blue line) should be the larger one, right? For all other images in my database, I see the axes are proper. Only for this image, it is interchanged; I have attached the input image (Img 209.png)

Is there anything wrong in my understanding?

This is what I expect:

For example, I have attached BinImg_20.png for which it works.

Attached files are on Dropbox.

MvG · Accepted Answer

I'd say the problem is here:

evec1, evec2 = evecs[:, sort_indices]

At this point you have

evecs = [[ 0.70926191  0.70494507]
         [-0.70494507  0.70926191]]
evec1 = [ 0.70926191  0.70494507]
evec2 = [-0.70494507  0.70926191]

The indexing by sort_indices correctly sorts the columns. But the unpacking assignment then unpacks by rows, leading to the mistake you observed.

See also the documentation for numpy.linalg.eig which states:

v: (..., M, M) array – The normalized (unit “length”) eigenvectors, such that the column v[:,i] is the eigenvector corresponding to the eigenvalue w[i].

And the tutorial writes:

Iterating over multidimensional arrays is done with respect to the first axis

I did some cross-checking in R, to make sure the computation as such is sane and the eigenvectors are as you would expect them to be. In case you are interested, here is the input and output from that, with the center off by one as R does 1-based indexing.

> install.packages("pixmap")
> library(pixmap)
> pix <- read.pnm("BinImg_209.pbm")
> mask = which(pix@grey != 0, arr.ind=T)
> mask = data.frame(x=mask[,"col"], y=mask[,"row"])
> prcomp(mask)         # single step principal component analysis
Standard deviations (1, .., p=2):
[1] 117.84952  44.23114

Rotation (n x k) = (2 x 2):
         PC1       PC2
x  0.7092619 0.7049451
y -0.7049451 0.7092619

> cov <- cov.wt(mask)  # manual covariance computation
> cov                  # show computed data
$cov
          x         y
x  7958.874 -5965.947
y -5965.947  7886.030

$center
       x        y 
593.1907 519.1019 

$n.obs
[1] 47909

> eigen(cov$cov)       # manual computation of eigenvectors
eigen() decomposition
$values
[1] 13888.510  1956.394

$vectors
           [,1]       [,2]
[1,] -0.7092619 -0.7049451
[2,]  0.7049451 -0.7092619

Here, too, the columns of the vectors matrix are the eigenvectors. The rotation matrix in prcomp is more clearly labeled: the columns are principal components, the rows are original coordinates.

First two Principal components are reversed

Answers (1)

Related Questions