Reputation: 2889
I am using KNN to classify handwritten digits. I also now have implemented PCA to reduce the dimensionality. From 256 I went to 200. But I only notice like, ~0.10% loss of information. I deleted 56 dimension. Shouldn't the loss be bigger? Only when I drop to 5 dimensions I get a ~20% loss. Is this normal?
Upvotes: 15
Views: 15122
Reputation: 7798
You're saying that after removing 56 dimensions, you lost nearly no information? Of course, that's the point of PCA! Principal Component Analysis, as the name states, help you determine which dimensions carry the information. And you can remove the rest, which makes the biggest part of it.
I you want some examples, in gene analysis, I have read papers where the dimension is reduced from 40'000 to 100 with PCA, then they do some magical stuff, and have an excellent classifier with 19 dimensions. This implicitely tells you that they lost virtually no information when they removed 39'900 dimensions!
Upvotes: 10
Reputation: 7592
That's normal, yes (and like Fezvez said the point of what you did). Your case is actually a good example where you can see how that is possible.
Take a look at your data (that's always important in machine learning, know your data). If you have images of black hand written digits on white, there is a high possibility that the pixels in some corners are white for all samples (I had that in one corner when I did machine learning on hand written digits). So there is actually none information in that pixel whatsoever. If you drop that as input for your KNN or ANN or whatever, you will have the same results.
Upvotes: 1