Sayantan4796
Sayantan4796

Reputation: 191

How to perform Principal Component Analysis for different discharge stations?

This is rather a conceptual question for me. I have station discharge data. A reproducible example is shown below:

> A <- rnorm(n = 10)
> B <- rnorm(n = 10)
> C <- rnorm(n = 10)
> D <- rnorm(n = 10)
> Year <- seq(1981, 1990,1)
> df <- cbind.data.frame(A,B,C,D,Year)
> df
             A          B          C           D
1   0.01121438 -0.8051576 -0.3310504 -0.42942510
2  -0.43391287  0.6532436  1.1708714  0.07139685
3   0.97699859  0.3594398  1.1964296  0.21978991
4   0.40884971 -0.1116279  1.3725900 -2.02855285
5  -1.27919745 -1.4417479  0.5295565  0.75712199
6  -0.62038250  0.4426559  0.8202428 -0.82079685
7   0.09655825  0.9243231 -1.5198267  1.51316114
8  -0.44051474  0.4399702 -0.7746237 -0.08734779
9  -0.38922528 -0.6368451 -0.5187176 -0.04337179
10 -0.67727348 -0.1201216 -1.6738859  0.64535227
   Year
1  1981
2  1982
3  1983
4  1984
5  1985
6  1986
7  1987
8  1988
9  1989
10 1990

Now I am required to perform PCA on the above station based data set. The way I performed it is as follows:

> pca <- prcomp(df[-5])
> pca
Standard deviations (1, .., p=4):
[1] 1.3529447 0.8232083 0.6621643 0.4783339

Rotation (n x k) = (4 x 4):
          PC1         PC2         PC3         PC4
A -0.16040396  0.52718949 -0.14776084 -0.82128469
B  0.02441463  0.84464804  0.01609344  0.53452279
C -0.78797565 -0.01163029  0.61455083  0.03586638
D  0.59394349  0.09222617  0.77474836 -0.19618981

But my colleagues told me that this is an incorrect way of performing PCA on spatially distributed stations. According to them the below should be the correct format of the resulting pca:

> pca
   Year        PC1        PC2        PC3
1  1981 -0.4192441 -1.0005214  1.1762915
2  1982 -0.1544769  0.2806917  0.5949149
3  1983 -0.3986093  1.5958552  0.2692496
4  1984  0.7621084 -0.3570052  0.7364546
5  1985  0.8328032  0.6859986  1.0685002
6  1986 -1.1903815 -0.4175560 -0.7729808
7  1987 -0.3834361  0.8492876 -0.1476914
8  1988 -1.1031614 -0.3747533  0.4373630
9  1989 -0.1118474 -1.1136896 -0.6296304
10 1990  0.3307309  0.8234295 -0.9062146
           PC4
1   1.00558991
2  -0.41620086
3   0.74466221
4   1.68333910
5   1.10491385
6   0.60896178
7   0.67494475
8   0.07006129
9  -2.68101223
10  0.58142017

So, I should have the same number of PCs as the number of stations for each year. I am not sure how to get the desired result either by tweaking the data frame format or something in the prcomp that I am not aware of. Any help is appreciated. Thank You.

Upvotes: 4

Views: 47

Answers (0)

Related Questions