Build loop to use increasing part of dataframe in R as input to function

Question

I'm using the first principal component from a PCA analysis as an explanatory variable in a forecasting model that forecasts recursively using Kalman filtering. In other words, at each point in time, the model updates and produces a new forecast based on the new observation included into the model. Since PCA uses data from all observations included in the model for its calculations, I need to run also the PCAs recursively, using only the observations prior to the point in time that I am forecasting (otherwise, the PCA-result could reveal information about the future, and help the model produce a more accurate answer than it would have otherwise). I think a loop might be the solution, but I am struggling with how to formulate the code.

As a more specific example, consider if I have the following data.frame

data <- as.data.frame(rbind(c(6,15,23),c(9,11,22), c(7,13,23), c(6,12,25),c(7,13,23)))
names(data) <- c("V1","V2","V3")

> data
  V1 V2 V3
1  6 15 23
2  9 11 22
3  7 13 23
4  6 12 25
5  7 13 23

At each observation date, I wish to run a PCA (function prcomp() from the stats-package) for all observations up to, and including, that observation. So I want to first run PCA for the two first observation

pca2 <- prcomp(data[1:2,], scale = TRUE)

next I want to run PCA with the first, second and third observation as input

pca3 <- prcomp(data[1:3,], scale = TRUE)

next I want to run PCA with the first, second, third and fourth observation as input

pca4 <- prcomp(data[1:4,], scale = TRUE)

and so on, until the last run of the PCA, which includes all observations in the dataframe. For each of these "runs" of the PCA, I wish to extract the last value (though for pca2, I use both the first and second value) of the first principal component (PC1), and merge these into a final dataframe, where each monthly observation is the last value of the first principal component of PCA results for each of the runs.

The principal component outputs are:

> my_pca2 <- as.data.frame(pca2$x)
> my_pca2
        PC1           PC2
1 -1.224745 -5.551115e-17
2  1.224745  5.551115e-17

> my_pca3 <- as.data.frame(pca3$x)
> my_pca3
         PC1        PC2          PC3
1 -1.4172321 -0.2944338 6.106227e-16
2  1.8732448 -0.1215046 3.330669e-16
3 -0.4560127  0.4159384 4.163336e-16

> my_pca4 <- as.data.frame(pca4$x)
> my_pca4
          PC1         PC2          PC3
1 -1.03030993 -1.10154914  0.015457199
2  2.00769890  0.07649216  0.011670433
3  0.03301806 -0.24226508 -0.033461874
4 -1.01040702  1.26732205  0.006334242

So I want my final output to be a dataframe to look like

>final.output
         PC1
1  -1.224745
2   1.224745
3 -0.4560127
4 -1.01040702

Comment: yes, it looks a bit weird with the two first values, but please don't pay too much attention to that. My point is that I wish to build a dataframe that consists of the last calculated value for the first principal component for each of the PCA runs.

I am thinking that a for.loop might be the best solution here, but I have not been successful in finding any threads that might guide me closer to a coding solution. How can I make the loop use an increasing amount of the dataframe in the calculations? Does anyone have any suggestions/tips/links? Any help on this is much appreciated!

Edward · Accepted Answer

I had a very similar approach.

PCA <- vector("list", length=nrow(data)-1)
for(i in 1:(nrow(data)-1)) {
  if(i==1) j <- 1:2 else j<-i+1
  PCA[[i]] <- as.data.frame(prcomp(data[1:(1+i),], scale = TRUE)$x)[j, 1]
}

unlist(PCA)

Build loop to use increasing part of dataframe in R as input to function

Answers (2)

Related Questions