Reputation: 156
Two different methods of the principal component analysis were conducted to analyze the following data (ch082.dat) using the Box1's R-code, below.
https://drive.google.com/file/d/1xykl6ln-bUnXIs-jIA3n5S3XgHjQbkWB/view?usp=sharing
The first method uses the rotation matrix (See 'ans_mat' under the '#rotated data' of the Box1's code) and, the second method uses the 'pcomp' function (See 'rpca' under the '#rotated data' of the Box1's code).
However, there is a subtle discrepancy in the answer between the method using the rotation matrix and the method using the 'pcomp' function. make it match
My Question
What should I do so that the result of the rotation matrix -based method matches the result of the'pcomp' function?
As far as I've tried with various data, including other data, the actual discrepancies seem to be limited to scale shifts and mirroring transformations.
Mirror inversion can be seen in "ch082.dat" data.(See Fig.1);
It seems that, in some j, the sign of the "jth eigenvector of the correlation matrix" and the sign of the "jth column of the output value of the prcomp function" may be reversed. If there is a degree of overlap in the eigenvalues, it is possible that the difference may be more complex than mirror inversion.
Fig.1
There is a scale shift for the Box2's data (See See Fig.2), despite the centralization and normalization to the data.
Fig.2
Box.1
#dataload
##Use the 'setwd' function to specify the directory containing 'ch082.dat'.
##For example, if you put this file directly under the C drive of your Windows PC, you can run the following command.
setwd("C:/") #Depending on where you put the file, you may need to change the path.
getwd()
w1<-read.table("ch082.dat",header = TRUE,row.names = 1,fileEncoding = "UTF-8")
w1
#Function for standardizing data
#Thanks to https://qiita.com/ohisama2/items/5922fac0c8a6c21fcbf8
standalize <- function(data)
{ for(i in length(data[1,]))
{
x <- as.matrix(data[,i])
y <- (x-mean(x)/sd(x))
data[,i] <- y
}
return(data)}
#Method using rotation matrix
z_=standalize(w1)
B_mat=cor(z_) #Compute correlation matrix
eigen_m <- eigen(B_mat)
sample_mat <- as.matrix(z_)
ans_mat=sample_mat
for(j in 1:length(sample_mat[1,])){
ans_mat[,j]=sample_mat%*%eigen_m$vectors[,j]
}
#Method using "rpca" function
rpca <- prcomp(w1,center=TRUE, scale=TRUE)
#eigen vectors
eigen_m$vectors
rpca
#rotated data
ans_mat
rpca$x
#Graph Plots
par(mfrow=c(1,2))
plot(
ans_mat[,1],
ans_mat[,2],
main="Rotation using eigenvectors"
)
plot(rpca$x[,1], rpca$x[,2],
main="Principal component score")
par(mfrow=c(1,1))
#summary
summary(rpca)$importance
Box2.
sample_data <- data.frame(
X = c(2,4, 6, 5,7, 8,10),
Y = c(6,8,10,11,9,12,14)
)
X = c(2,4, 6, 5,7, 8,10)
Y = c(6,8,10,11,9,12,14)
plot(Y ~ X)
w1=sample_data
Reference
https://logics-of-blue.com/principal-components-analysis/
(Written in Japanease)
Upvotes: 2
Views: 539
Reputation: 11076
The two sets of results agree. First we can simplify your code a bit. You don't need your function or the for loop:
z_ <- scale(w1)
B_mat <- cor(z_)
eigen_m <- eigen(B_mat)
ans_mat <- z_ %*% eigen_m$vectors
Now the prcomp
version
z_pca <- prcomp(z_)
z_pca$sdev^2 # Equals eigen_m$values
z_pca$rotation # Equals eigen_m$vectors
z_pca$x # Equals ans_mat
Your original code mislabeled ans_mat
columns. They are actually the principal component scores. You can fix that with
colnames(ans_mat) <- colnames(z_pca$x)
The pc loadings (and therefore the scores) are not uniquely defined with respect to reflection. In other words multiplying all of the loadings or scores in one component by -1 flips them but does not change their relationships to one another. Multiply z_pca$x[, 1]
by -1 and the plots will match:
z_pca$x[, 1] <- z_pca$x[, 1] * -1
dev.new(width=10, height=6)
par(mfrow=c(1,2))
plot(ans_mat[,1], ans_mat[,2], main="Rotation using eigenvectors")
plot(z_pca$x[,1], z_pca$x[,2], main="Principal component score")
Upvotes: 2