schoon
schoon

Reputation: 3324

How do I interpret Spark PCA output?

I run Spark (2.2) PCA with threee variables: x, y and z. I get:

+-----------------------------------------------------------+
|pcaFeatures                                                |
+-----------------------------------------------------------+
|[4192.998527751072,7.815744760976605,2.064076348440629]    |
|[934.9987857492071,6.178849121007534,2.0229856767680876]   |
|[81.99880210954893,6.012098465539804,2.0127405793319535] ...

So these are eigenvectors. Do they correspond to x, y and z in that order? If PCA is about feature reduction then can I say x explains most of the data so just use x? Can I express this mathematically as a percentage, since I have a vector of values?

Upvotes: 2

Views: 487

Answers (1)

addmeaning
addmeaning

Reputation: 1398

PCA used to reduce the number of dimensions. If input dimensionality is 3 (x, y, z) and output dimensionality is also 3, then there wasn't really dimensionality reduction and PCA doesn't make any sense.

Output features don't correspond to x, y, and z. It's just a new set of features. No, you can't say that first feature explains most of the data.

Upvotes: 2

Related Questions