Reputation: 1
How to interpret the cluster visualizations, that are formed by PCA. Suppose that I have 13 variables A,B,C..... in my data set I want to see how they are performing in unsupervised learning.As you cannot visualize clusters with all the 13 variables. I would use PCA in this case for dimensionality reduction and then plot the clusters. How should i interpret the clusters formed by the 2 dimensions of the PCA.
Upvotes: 0
Views: 682
Reputation: 801
Essentially you have projected your data into 2D in order to visualize it. But which 2D space have you projected it into? It is the 2D space that best preserves the variability of the data. In essence, each axis in the 2D space represents an (orthogonal) direction in the original space, which is a (linear) combination of the original variables. So you can interpret the result as being a visualization of the clusters in a space that represents the best linear reduction of the original space ("best" meaning it preserves the variance in the data most accurately). Thus, you might expect the cluster members to be closer to each other in 2D than non-cluster members. However, this will not necessarily happen; if it does not, it suggests that the PCA dimensionality reduction did not preserve the structure of the data found by the clustering algorithm. (It does not necessarily mean the clustering failed or that there is no inherent clusterable structure in the data though, it might just be too non-linear to be conserved under the projection... or it might not be there at all).
For more intuition, see this question.
Upvotes: 1