Reputation: 625
I am trying to compute PCA of images using this code by Alyssa in python.
# Read the image into palmFingers_mask:
# we use PCA and otain 2 principal components
y, x = np.nonzero(palmFingers_mask)
# Subtract mean from each dimension
x = x - np.mean(x)
y = y - np.mean(y)
coords = np.vstack([x, y])
# Covariance matrix and its eigenvectors and eigenvalues
cov = np.cov(coords)
evals[:,frm_cnt], evecs = np.linalg.eig(cov)
# Sort eigenvalues in decreasing order
sort_indices = np.argsort(evals[:,frm_cnt])[::-1]
evec1[:,frm_cnt], evec2[:,frm_cnt] = evecs[:, sort_indices]
x_v1, y_v1 = evec1[:,frm_cnt] # Eigenvector with largest eigenvalue
x_v2, y_v2 = evec2[:,frm_cnt]
# Plot the principal components
scale = 20
plt.figure(1)
plt.plot(x, y, 'y.')
plt.plot([x_v1*-scale*2, x_v1*scale*2],
[y_v1*-scale*2, y_v1*scale*2], color='red')
plt.plot([x_v2*-scale, x_v2*scale],
[y_v2*-scale, y_v2*scale], color='blue')
plt.axis('equal')
plt.gca().invert_yaxis() # Match the image system with origin at top left
plt.show()
I mark the two axes with red and blue lines (red is the larger component, length of lines does not signify anything).
While doing so, I came across one faulty case, shown below:
How can it be possible? This depicts that the red eigen vector is larger, but intuitively, it should not be the case. The axes (along blue line) should be the larger one, right? For all other images in my database, I see the axes are proper. Only for this image, it is interchanged; I have attached the input image (Img 209.png)
Is there anything wrong in my understanding?
This is what I expect:
For example, I have attached BinImg_20.png for which it works.
Attached files are on Dropbox.
Upvotes: 0
Views: 460
Reputation: 60958
I'd say the problem is here:
evec1, evec2 = evecs[:, sort_indices]
At this point you have
evecs = [[ 0.70926191 0.70494507]
[-0.70494507 0.70926191]]
evec1 = [ 0.70926191 0.70494507]
evec2 = [-0.70494507 0.70926191]
The indexing by sort_indices
correctly sorts the columns. But the unpacking assignment then unpacks by rows, leading to the mistake you observed.
See also the documentation for numpy.linalg.eig
which states:
v
: (..., M, M) array – The normalized (unit “length”) eigenvectors, such that the columnv[:,i]
is the eigenvector corresponding to the eigenvaluew[i]
.
And the tutorial writes:
Iterating over multidimensional arrays is done with respect to the first axis
I did some cross-checking in R, to make sure the computation as such is sane and the eigenvectors are as you would expect them to be. In case you are interested, here is the input and output from that, with the center off by one as R does 1-based indexing.
> install.packages("pixmap")
> library(pixmap)
> pix <- read.pnm("BinImg_209.pbm")
> mask = which(pix@grey != 0, arr.ind=T)
> mask = data.frame(x=mask[,"col"], y=mask[,"row"])
> prcomp(mask) # single step principal component analysis
Standard deviations (1, .., p=2):
[1] 117.84952 44.23114
Rotation (n x k) = (2 x 2):
PC1 PC2
x 0.7092619 0.7049451
y -0.7049451 0.7092619
> cov <- cov.wt(mask) # manual covariance computation
> cov # show computed data
$cov
x y
x 7958.874 -5965.947
y -5965.947 7886.030
$center
x y
593.1907 519.1019
$n.obs
[1] 47909
> eigen(cov$cov) # manual computation of eigenvectors
eigen() decomposition
$values
[1] 13888.510 1956.394
$vectors
[,1] [,2]
[1,] -0.7092619 -0.7049451
[2,] 0.7049451 -0.7092619
Here, too, the columns of the vectors
matrix are the eigenvectors. The rotation matrix in prcomp
is more clearly labeled: the columns are principal components, the rows are original coordinates.
Upvotes: 2