Blue Various
Blue Various

Reputation: 156

Principal component analysis using R. Automatic and manual results do not match

Two different methods of the principal component analysis were conducted to analyze the following data (ch082.dat) using the Box1's R-code, below.
https://drive.google.com/file/d/1xykl6ln-bUnXIs-jIA3n5S3XgHjQbkWB/view?usp=sharing

The first method uses the rotation matrix (See 'ans_mat' under the '#rotated data' of the Box1's code) and, the second method uses the 'pcomp' function (See 'rpca' under the '#rotated data' of the Box1's code).

However, there is a subtle discrepancy in the answer between the method using the rotation matrix and the method using the 'pcomp' function. make it match

My Question

What should I do so that the result of the rotation matrix -based method matches the result of the'pcomp' function?

As far as I've tried with various data, including other data, the actual discrepancies seem to be limited to scale shifts and mirroring transformations.

Mirror inversion can be seen in "ch082.dat" data.(See Fig.1); It seems that, in some j, the sign of the "jth eigenvector of the correlation matrix" and the sign of the "jth column of the output value of the prcomp function" may be reversed. If there is a degree of overlap in the eigenvalues, it is possible that the difference may be more complex than mirror inversion. enter image description here
Fig.1

There is a scale shift for the Box2's data (See See Fig.2), despite the centralization and normalization to the data. enter image description here
Fig.2

Box.1

#dataload
##Use the 'setwd' function to specify the directory containing 'ch082.dat'.
##For example, if you put this file directly under the C drive of your Windows PC, you can run the following command.

setwd("C:/") #Depending on where you put the file, you may need to change the path.
getwd()
w1<-read.table("ch082.dat",header = TRUE,row.names = 1,fileEncoding = "UTF-8")
w1

#Function for standardizing data
#Thanks to https://qiita.com/ohisama2/items/5922fac0c8a6c21fcbf8
standalize <- function(data)
{ for(i in length(data[1,]))
{
  x <- as.matrix(data[,i])
  y <- (x-mean(x)/sd(x))
  data[,i] <- y
}
  return(data)}


#Method using rotation matrix
z_=standalize(w1)
B_mat=cor(z_) #Compute correlation matrix
eigen_m <- eigen(B_mat)
sample_mat <- as.matrix(z_)
ans_mat=sample_mat

for(j in 1:length(sample_mat[1,])){
ans_mat[,j]=sample_mat%*%eigen_m$vectors[,j]
}

#Method using "rpca" function
rpca <- prcomp(w1,center=TRUE, scale=TRUE)

#eigen vectors
eigen_m$vectors
rpca

#rotated data
ans_mat
rpca$x

#Graph Plots
par(mfrow=c(1,2))
plot(
  ans_mat[,1], 
  ans_mat[,2], 
  main="Rotation using eigenvectors"
)
plot(rpca$x[,1], rpca$x[,2], 
     main="Principal component score")
par(mfrow=c(1,1))

#summary
summary(rpca)$importance

Box2.

sample_data <- data.frame(
  X = c(2,4, 6, 5,7, 8,10),
  Y = c(6,8,10,11,9,12,14)
)

X = c(2,4, 6, 5,7, 8,10)
Y = c(6,8,10,11,9,12,14)
plot(Y ~ X)

w1=sample_data 

Reference
https://logics-of-blue.com/principal-components-analysis/ (Written in Japanease)

Upvotes: 2

Views: 539

Answers (1)

dcarlson
dcarlson

Reputation: 11076

The two sets of results agree. First we can simplify your code a bit. You don't need your function or the for loop:

z_ <- scale(w1)
B_mat <- cor(z_)
eigen_m <- eigen(B_mat)
ans_mat <- z_ %*% eigen_m$vectors

Now the prcomp version

z_pca <- prcomp(z_)
z_pca$sdev^2    # Equals eigen_m$values
z_pca$rotation  # Equals eigen_m$vectors
z_pca$x         # Equals ans_mat

Your original code mislabeled ans_mat columns. They are actually the principal component scores. You can fix that with

colnames(ans_mat) <- colnames(z_pca$x)

The pc loadings (and therefore the scores) are not uniquely defined with respect to reflection. In other words multiplying all of the loadings or scores in one component by -1 flips them but does not change their relationships to one another. Multiply z_pca$x[, 1] by -1 and the plots will match:

z_pca$x[, 1] <- z_pca$x[, 1] * -1
dev.new(width=10, height=6)
par(mfrow=c(1,2))
plot(ans_mat[,1], ans_mat[,2], main="Rotation using eigenvectors")
plot(z_pca$x[,1], z_pca$x[,2], main="Principal component score")

PCA plots

Upvotes: 2

Related Questions