Petrus
Petrus

Reputation: 65

Same values for PCA Loadings results

I've recently performed a Principle component analysis for my masters thesis where I have 25 network datasets, formatted into graphs and applied 5 measurements to each graph. The measurements were formatted into a table where the rows are datasets and the columns are the results, as shown below:

enter image description here

I then scaled the results to ensure that they are centered to have mean zero (according to An Introduction to Statisical learning, G. James, 2013), with this function:

dat <- data.frame(lapply(measures, function(x) scale(x, center = FALSE, scale = max(x, na.rm = TRUE)/100)))

This scale function is applied by each measure's standard deviation. I then applied PCA using the princomp function in R, princomp(dat, cor = T, scores = T) which returned these loading results:

Loadings:
                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
Transitivity    0.585  0.412  0.246  0.136  0.640
Reciprocity     0.540 -0.145 -0.336 -0.750 -0.111
centralization -0.600  0.280        -0.582  0.469
density                0.327 -0.893  0.261  0.146
assortativity          0.790  0.159 -0.111 -0.581

                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
SS loadings       1.0    1.0    1.0    1.0    1.0
Proportion Var    0.2    0.2    0.2    0.2    0.2
Cumulative Var    0.2    0.4    0.6    0.8    1.0

I would like to ask what would cause the SS loadings and Proportion Variables to have exactly the same results? I'm not sure if this is a discrepancy in my data, the scaling methods I'm using or if this is even something I should worry about. I see that someone had similar results in this query, but did not discuss it, so perhaps it's normal? Any explanation for the impact this has will be much appreciated.

Biplot:

enter image description here

The Screeplot is also doesn't make much sense, since I expected a exponential drop-off, I assume this is a reflection of the loadings results. Screeplot:

enter image description here

Upvotes: 0

Views: 575

Answers (2)

Schalkie
Schalkie

Reputation: 56

I suppose the first question you would like to have answered is what the SS Loadings are. These are the sums of the squares of the loadings - geometrically, they are the square of the length of each of the loading vectors (the length of a vector is the square root of the sum of the squared components). From a technical perspective, the eigenvectors (or loadings) form a basis of R5 and each of these loadings have been normalised so that the sum of the squares of the elements (the square of length of each) equals 1. You can think of it as a best practice of sorts I suppose.

In short, I wouldn't be too bothered by this.

I would suggest achieving the result from first principles as below.

#original data
df <- data.frame('transitivity'=c(34,8,8,37,15,29), 'reciprocity'=c(20, 34, 34, 25, 20, 7), 'centralization'=c(100, 99,99,100,99,99), 'density'=c(34, 7,7,2,3,0.7), 'assortativity'=c(-48, -53, -53, -33, 14, -45))
#scale according to the OP's procedure.
dat <- data.frame(lapply(df, function(x) scale(x, center = FALSE, scale = max(x, na.rm = TRUE)/100)))
#calculate correlation matrix.
cormat <- cor(dat)
#diagonalise
pca <- eigen(cormat)
#show that result is normalised. 
apply(pca$vectors, 2, function(x) sum(x^2)) #Result will sum to 1 regardless of whether we use margin 1 or 2. Neat excercise to prove why. 
#calculate % of var explained by each component. 
pc_var <- pca$values/5*100
barplot(pc_var)

I am going to leave the interpretation of the results to you!

Upvotes: 1

Earl Mascetti
Earl Mascetti

Reputation: 1336

I suggest you to change the package and use the FactoMiner. In this way you will bypass the problem of the scale, because the function PCA has the option scale.unit ( a boolean, if TRUE - value set by default - then data are scaled to unit variance)

below a quick example

library(FactoMineR)
data(cars)
mtcars_pca<-cars_pca<-PCA(mtcars, scale.unit = TRUE)

In this way you can check if this result comes from your data or is a mistake.

Here there is the link of the personal web site of the package and here you can find the videos about the package (all this stuff was made from the author) with the real examples.

Upvotes: 1

Related Questions