Reputation: 3324
I run Spark (2.2) PCA with threee variables: x, y and z. I get:
+-----------------------------------------------------------+
|pcaFeatures |
+-----------------------------------------------------------+
|[4192.998527751072,7.815744760976605,2.064076348440629] |
|[934.9987857492071,6.178849121007534,2.0229856767680876] |
|[81.99880210954893,6.012098465539804,2.0127405793319535] ...
So these are eigenvectors. Do they correspond to x, y and z in that order? If PCA is about feature reduction then can I say x explains most of the data so just use x? Can I express this mathematically as a percentage, since I have a vector of values?
Upvotes: 2
Views: 487
Reputation: 1398
PCA used to reduce the number of dimensions. If input dimensionality is 3 (x, y, z) and output dimensionality is also 3, then there wasn't really dimensionality reduction and PCA doesn't make any sense.
Output features don't correspond to x, y, and z. It's just a new set of features. No, you can't say that first feature explains most of the data.
Upvotes: 2