Reputation: 191
This is rather a conceptual question for me. I have station discharge data. A reproducible example is shown below:
> A <- rnorm(n = 10)
> B <- rnorm(n = 10)
> C <- rnorm(n = 10)
> D <- rnorm(n = 10)
> Year <- seq(1981, 1990,1)
> df <- cbind.data.frame(A,B,C,D,Year)
> df
A B C D
1 0.01121438 -0.8051576 -0.3310504 -0.42942510
2 -0.43391287 0.6532436 1.1708714 0.07139685
3 0.97699859 0.3594398 1.1964296 0.21978991
4 0.40884971 -0.1116279 1.3725900 -2.02855285
5 -1.27919745 -1.4417479 0.5295565 0.75712199
6 -0.62038250 0.4426559 0.8202428 -0.82079685
7 0.09655825 0.9243231 -1.5198267 1.51316114
8 -0.44051474 0.4399702 -0.7746237 -0.08734779
9 -0.38922528 -0.6368451 -0.5187176 -0.04337179
10 -0.67727348 -0.1201216 -1.6738859 0.64535227
Year
1 1981
2 1982
3 1983
4 1984
5 1985
6 1986
7 1987
8 1988
9 1989
10 1990
Now I am required to perform PCA on the above station based data set. The way I performed it is as follows:
> pca <- prcomp(df[-5])
> pca
Standard deviations (1, .., p=4):
[1] 1.3529447 0.8232083 0.6621643 0.4783339
Rotation (n x k) = (4 x 4):
PC1 PC2 PC3 PC4
A -0.16040396 0.52718949 -0.14776084 -0.82128469
B 0.02441463 0.84464804 0.01609344 0.53452279
C -0.78797565 -0.01163029 0.61455083 0.03586638
D 0.59394349 0.09222617 0.77474836 -0.19618981
But my colleagues told me that this is an incorrect way of performing PCA on spatially distributed stations. According to them the below should be the correct format of the resulting pca
:
> pca
Year PC1 PC2 PC3
1 1981 -0.4192441 -1.0005214 1.1762915
2 1982 -0.1544769 0.2806917 0.5949149
3 1983 -0.3986093 1.5958552 0.2692496
4 1984 0.7621084 -0.3570052 0.7364546
5 1985 0.8328032 0.6859986 1.0685002
6 1986 -1.1903815 -0.4175560 -0.7729808
7 1987 -0.3834361 0.8492876 -0.1476914
8 1988 -1.1031614 -0.3747533 0.4373630
9 1989 -0.1118474 -1.1136896 -0.6296304
10 1990 0.3307309 0.8234295 -0.9062146
PC4
1 1.00558991
2 -0.41620086
3 0.74466221
4 1.68333910
5 1.10491385
6 0.60896178
7 0.67494475
8 0.07006129
9 -2.68101223
10 0.58142017
So, I should have the same number of PCs as the number of stations for each year. I am not sure how to get the desired result either by tweaking the data frame format or something in the prcomp
that I am not aware of. Any help is appreciated. Thank You.
Upvotes: 4
Views: 47