Reputation: 513
I have a 115*8000 data where 115 is the number of features. When I use pca function of matlab like this
[coeff,score,latent,tsquared,explained,mu] = pca(data);
on my data. I get some values. I read on here that how can I reduce my data but one thing confuses me. The explained
data shows how much a feature weighs on calculation but do features get reorganized in this proces or features are exactly in same order as I give it to function?
Also I give 115 features but explained
shows 114. Why does it happen?
Upvotes: 1
Views: 117
Reputation: 1490
PCA does not "choose" some of your features and remove the rest. So you should not still be thinking about the original features after running PCA.
It is well-explained here on Wikipedia. You are converting your samples from the space defined by your original features to a space where features are linearly uncorrelated and called "principal components". Note: these components are no longer the original features.
An example of this in 2D could be: you have a vector z=(2,3)
defined in your Euclidean space. It needs 2 features (the x and the y). If we change the space and define it using the coordinate vectors v=(2,3)
and w an orthogonal vector to v, then z=(1,0) i.e. z=1.v+0.w
and can now be represented with only 1 feature (the first coordinate!).
The link that you shared explains exactly (in the selected answer) how you can go about using the outputs of the pca function to reduce your dimensionality.
(As noted by Ander you do not care about the last components since these are the weakest anyway and you want to drop them)
Upvotes: 1
Reputation: 35525
The data is not "reorganized" in PCA, is transformed to a new space. When you crop the PCA space, that is your data, but you are not going to be able to visualize/understand it there, you need to convert it back to "normal" space, using eigenvectors and such.
explained gives you 114 because you now what is the answer with 115! 100% of the data can be explained with the whole data!
Read about it further in this answer: Significance of 99% of variance covered by the first component in PCA
Upvotes: 1