EKSK
EKSK

Reputation: 212

R / Rolling Regression with extended Data Frame

Hallo I'm currently working on a Regression Analysis with the following Code:

for (i in 1:ncol(Ret1)){
  r2.out[i]=summary(lm(Ret1[,1]~Ret1[,i]))$r.squared 
} 
r2.out

This Code runs a simple OLS Regression of each column in the data Frame agianst the first column and provides the R^2 of These regressions. At the Moment the Regression uses all data Points of a column. What I Need now is that the Code instead of using all data Points in a column just uses a rolling window of data Points. So he calculates for a rolling window of 30 Days the R^2 over the entire time Frame. The output is a Matrix with all the R^2 per rolling window for each (1,i) pair.

This Code does the rolling Regression part but does not make the Regression for each (1,i) pair.

dolm <- function(x) summary(lm(Ret1[,1]~Ret1[,i]))$r.squared 
        rollapplyr(Ret1, 30, dolm, by.column = FALSE)

I really appreciate any help you can provide.

Upvotes: 2

Views: 311

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269596

Using the built-in anscombe data frame we regress the y1 column against x1 and then x2, etc. We use a width of 3 here for purposes of illustration.

xnames should be set to the names of the x variables. In the anscombe data set the column names that begin with x are the x variables. As another example, if all the columns are x variables except the first then xnames <- names(DF)[-1] could be used.

We define an R squared function, rsq which takes the indexes to use, ix and the x variable name xname. We then sapply over the xnames and for each one rollapply over the indices 1:n.

library(zoo)

xnames <- grep("x", names(anscombe), value = TRUE)
n <- nrow(anscombe)
w <- 3
rsq <- function(ix, xname) summary(lm(y1 ~., anscombe[c("y1", xname)], subset = ix))$r.sq
sapply(xnames, function(xname) rollapply(1:n, w, rsq, xname = xname ))

giving the following result of dimensions n - w + 1 by length(xnames):

                x1           x2           x3        x4
 [1,] 2.285384e-01 2.285384e-01 2.285384e-01 0.0000000
 [2,] 3.591782e-05 3.591782e-05 3.591782e-05 0.0000000
 [3,] 9.841920e-01 9.841920e-01 9.841920e-01 0.0000000
 [4,] 5.857410e-01 5.857410e-01 5.857410e-01 0.0000000
 [5,] 9.351609e-01 9.351609e-01 9.351609e-01 0.0000000
 [6,] 8.760332e-01 8.760332e-01 8.760332e-01 0.7724447
 [7,] 9.494869e-01 9.494869e-01 9.494869e-01 0.7015512
 [8,] 9.107256e-01 9.107256e-01 9.107256e-01 0.3192194
 [9,] 8.385510e-01 8.385510e-01 8.385510e-01 0.0000000

Variations

1) It would also be possible to reverse the order of the rollapply and sapply replacing the last line of code with:

rollapply(1:n, 3, function(ix) sapply(xnames, rsq, ix = ix))

2) Another variation is to replace the definition of rsq and the sapply/rollapply line with the following single statement. It may be a bit harder to read so you may prefer the first solution but it does entail one simplification -- namely, xname need no longer be an explicit argument of the inner anonymous function (which takes the place of rsq above):

sapply(xnames, function(xname) rollapply(1:n, 3, function(ix) 
    summary(lm(y1 ~., anscombe[c("y1", xname)], subset = ix))$r.sq))

Update: Have fixed line which is now n <- nrow(anscombe)

Upvotes: 1

Related Questions