Reputation: 154
I came across this question in datacamp.com:
Bellow are three scatter plots of the same point cloud. Each scatter plot shows a different set of axes (in red). In which of the plots could the axes represent the principal components of the point cloud?
Recall that the principal components are the directions along which the the data varies?
Answer: Plot 1 and 3
My question is what does the question mean? Why is plot 2 not part of the answer since the axis can be rotated to fit the point cloud.
Upvotes: 2
Views: 406
Reputation: 8572
As suggested in the comments, this is better fit for cross validation, or possibly math.stackexchange.
Now the answer is intuitively rather simple.
Principal components can be obtained by an iterative process such that:
a_1 %*% X
which maximizes Var(a_1 %*% X)
subject to t(a_1) %*% a_1 = 1
a_2 %*% X
which maximizes Var(a_2 %*% X)
subject to t(a_2) %*% a_2 = 1
and cov(a_1 %*% X, a_2 %*% X) = 0
From this definition note that var(a_1 %*% X) = var( - a_1 %*% X)
, and thereby the principal component is only determined up to the sign of the component.
From this definition we can see that: 1. 1 and 3 are equivalent, as the first (longest) line is in the direction where the points are most spread (show the greatest variance) 2. The 2'nd plot cannot be the principal component as the direction does not line up with the direction of greatest variance
Chapter 8, page 430 (ish) in Applied Multivariate Statistical Analysis contains a theoretical explanation in more detail.
Upvotes: 3
Reputation: 1984
As mentioned by @NelsonGon, this would probably be better on CrossValidated... but anyhow :
Plots 1 and 3 are correct because their axes are in fact those that maximize variance on the plane shown. The vectors can be flipped as the sign of the eigenvectors is arbitrary in PCA (you'll notice that the red vectors in plots 1 and 3 are along the same axes, one of them is just 'flipped'). Plot 2's vectors however clearly don't go along axes maximizing the spread of the dot cloud, hence the answer on the post you're referring to.
Upvotes: 2