Reputation: 6290
I have a matrix M where the columns are data points and the rows are features. Now I want to do PCA and select only the first component which has highest variance.
I know that I can do it in Matlab with [coeff,score,latent] = pca(M'). First I think I have to transpose matrix M.
How can I select now the first component? I'm not sure about the three different output matrices.
Second, I also want to calculate the percentage of variance explained for each component. How can I do this?
Upvotes: 1
Views: 1990
Reputation: 1534
If your matrix has dimensions m x n, where m is cases and n is variables:
% First you might want to normalize the matrix...
M = normalize(M);
% means very close to zero
round(mean(M),10)
% standard deviations all one
round(std(M),10)
% Perform a singular value decomposition of the matrix
[U,S,V] = svd(M);
% First Principal Component is the first column of V
V(:,1)
% Calculate percentage of variation
(var(S) / sum(var(S))) * 100
Upvotes: 0
Reputation: 2652
Indeed, you should transpose your input to have rows as data points and columns as features:
[coeff, score, latent, ~, explained] = pca(M');
The principal components are given by the columns of coeff
in order of descending variance, so the first column holds the most important component. The variances for each component are given in latent
, and the percentage of total variance explained is given in explained
.
firstCompCoeff = coeff(:,1);
firstCompVar = latent(1);
For more information: pca
documentation.
Note that the pca
function requires the Statistics Toolbox. If you don't have it, you can either search the internet for an alternative or implement it yourself using svd
.
Upvotes: 3