Reputation: 2459
I'm building a logistic regression model in Matlab with the Classification Learner Toolbox.
I ran PCA in Matlab:
[coeff, score, latent, tsquared, explained] = pca(CreditNumeric);
Here's the coeff, score, latent and explained output:
I want to use the results of PCA to reduce the input features I'm using as input in the Classification Learner (based upon my PCA results). How do I use the PCA results to select (say 5-7) features which best describe 95% of the variance of the data?
Upvotes: 1
Views: 2091
Reputation: 1
It is actually very simple, since in the Classification learner
when you upload all your variables you can choose the features that you want to use to train your model(s) (see the last screenshots where the "Feature selection" button appeared, next to the Import Data)
It is there, you can select as many variables as you like, and also train several combinations and compared at the end the differences between results.
The issue here is, I think if your 5-7 features (in this case Principal components) are or not describing the 95% of the variance of the data right?
For solving this, you could follow two approaches:
-Upload in the Classification learner
all your variables instead of the Principal Components, and use the PCA button that, in the new version of MatLab appeared next to the Feature selection one.
-Then you can establish the % of the explained variance (95) and the number of components (7)
pca
before in MatLab
, so you can see, control and analyze all the results and then train the principal components with the learner. On this way, you can actually know how many components you need to use in your model that explains 95% of the variance. And possible it is not 5-7, or maybe is less than that...explore first.
It is my suggestion. Good luck!
Upvotes: 0