Reputation: 25
I have a feature vector of size [4096 x 180], where 180 is the number of samples and 4096 is the feature vector length of each sample.
I want to reduce the dimensionality of the data using PCA.
I tried using the built in pca function of MATLAB [V U]=pca(X)
and reconstructed the data by X_rec= U(:, 1:n)*V(:, 1:n)'
, n
being the dimension I chose. This returns a matrix of 4096 x 180.
Now I have 3 questions:
n
as 200, it gave an error as matrix dimension increased, which gave me the assumption that we cannot reduce dimension lesser than the sample size. Is this true?I have to use the reduced dimension feature set for further classification.
If anyone can provide a detailed step by step explanation of the pca code for this I would be grateful. I have looked at many places but my confusion still persists.
Upvotes: 1
Views: 1377
Reputation: 861
You may want to refer to Matlab example to analyse city data.
Here is some simplified code:
load cities;
[~, pca_scores, ~, ~, var_explained] = pca(ratings);
Here, pca_scores
are the pca components with respective variances of each component in var_explained
. You do not need to do any explicit multiplication after running pca
. Matlab will give you the components directly.
In your case, consider that data X
is a 4096-by-180
matrix, i.e. you have 4096
samples and 180
features. Your goal is to reduce dimensionality such that you have p
features, where p < 180
. In Matlab, you can simply run the following,
p = 100;
[~, pca_scores, ~, ~, var_explained] = pca(X, 'NumComponents', p);
pca_scores
will be a 4096-by-p
matrix and var_explained
will be a vector of length p
.
To answer your questions:
pca_scores
is your reduced dimension data.var_explained
vector. Typically you want to retain about 99% variance of the features. You can read more about this here.Upvotes: 3