Error404
Error404

Reputation: 7131

standardization of data in R

I am doing some PCA analysis for large spreadsheets, and I'm picking my PCs according to the loadings. As far as I have read, since the data I have have differnt units, standardization is a must before performing the PCA analysis.

Does the function prcomp() inherently performs standardization?

I was reading the prcomp() help file and saw this under the arguments of prcomp():

scale. a logical value indicating whether the variables should be scaled to have
       unit variance before the analysis takes place. The default is FALSE for 
       consistency with S, but in general scaling is advisable. Alternatively, a
       vector of length equal the number of columns of x can be supplied. The
       value is passed to scale.

Does "scaling variables to have unit variance" mean standardization?

I am currently using this command:

prcomp(formula = ~., data=file, center = TRUE, scale = TRUE, na.action = na.omit)

is it enough? or shall I do a separate step of standardization?

Thanks,

Upvotes: 3

Views: 14201

Answers (2)

Gavin Simpson
Gavin Simpson

Reputation: 174948

Yes, scale = TRUE will result in all variables being scaled to have unit variance (i.e. a variance of 1, and hence a standard deviation of 1). This is the common definition of "standardise", but there are other ways to do it etc. center = TRUE mean-centres the data, i.e. the mean of a variable is subtracted from each observation of that variable.

When you do this (scale = TRUE, center = TRUE) instead of the PCA being on the covariance matrix of your data set, it is on the correlation matrix. Hence the PCA finds axes that explain the correlations between variables rather than their covariances.

Upvotes: 5

Paul Hiemstra
Paul Hiemstra

Reputation: 60984

If you mean by standardization that each column is divided by their standard deviation, and the mean of each column is subtracted, than using scale = TRUE and center = TRUE is what you want.

Upvotes: 3

Related Questions