Reputation: 7131
I am doing some PCA analysis for large spreadsheets, and I'm picking my PCs according to the loadings. As far as I have read, since the data I have have differnt units, standardization is a must before performing the PCA analysis.
Does the function prcomp()
inherently performs standardization?
I was reading the prcomp()
help file and saw this under the arguments of prcomp()
:
scale. a logical value indicating whether the variables should be scaled to have
unit variance before the analysis takes place. The default is FALSE for
consistency with S, but in general scaling is advisable. Alternatively, a
vector of length equal the number of columns of x can be supplied. The
value is passed to scale.
Does "scaling variables to have unit variance" mean standardization?
I am currently using this command:
prcomp(formula = ~., data=file, center = TRUE, scale = TRUE, na.action = na.omit)
is it enough? or shall I do a separate step of standardization?
Thanks,
Upvotes: 3
Views: 14201
Reputation: 174948
Yes, scale = TRUE
will result in all variables being scaled to have unit variance (i.e. a variance of 1, and hence a standard deviation of 1). This is the common definition of "standardise", but there are other ways to do it etc. center = TRUE
mean-centres the data, i.e. the mean of a variable is subtracted from each observation of that variable.
When you do this (scale = TRUE, center = TRUE
) instead of the PCA being on the covariance matrix of your data set, it is on the correlation matrix. Hence the PCA finds axes that explain the correlations between variables rather than their covariances.
Upvotes: 5
Reputation: 60984
If you mean by standardization that each column is divided by their standard deviation, and the mean of each column is subtracted, than using scale = TRUE
and center = TRUE
is what you want.
Upvotes: 3