Reputation: 197
I have a 500x1000 feature vector and principal component analysis says that over 99% of total variance is covered by the first component. So I replace 1000 dimension point by 1 dimension point giving 500x1 feature vector(using Matlab's pca function). But, my classifier accuracy which was initially around 80% with 1000 features now drops to 30% with 1 feature even though more than 99% of the variance is accounted by this feature. What could be the explanation to this or are my methods wrong?
(This question partly arises from my earlier question Significance of 99% of variance covered by the first component in PCA)
Edit: I used weka's principal components method to perform the dimensionality reduction and support vector machines(SVM) classifier.
Upvotes: 2
Views: 5264
Reputation: 19169
Principal Components do not necessarily have any correlation to classification accuracy. There could be a 2-variable situation where 99% of the variance corresponds to the first PC but that PC has no relation to the underlying classes in the data. Whereas the second PC (which only contributes to 1% of the variance) is the one that can separate the classes. If you only keep the first PC, then you lose the feature that actually provides the ability to classify the data.
In practice, smaller (lower variance) PCs often are associated with noise so there can be benefit in removing them but there is no guarantee of this.
Consider a case where you have two variables: a person's mass (in grams) and body temperature (in degrees Celsius). You want to predict which people have the flu and which do not. In this case, weight has a much greater variance but probably no correlation to the flu, whereas temperature, which has low variance, has a strong correlation to the flu. After the Principal Components transformation, the first PC will be strongly aligned with mass (since it has much greater variance) so if you dropped the second PC, would be losing almost all of your classification accuracy.
It is important to remember that Principal Components is an unsupervised transformation of the data. It does not consider labels of your training data when calculating the transformation (as opposed to something like Fisher's linear discriminant).
Upvotes: 12